root / doc / sphinx_doc / tuto.rst @ dadb6a4d
Historique | Voir | Annoter | Télécharger (23,21 ko)
1 |
Tutorial |
---|---|
2 |
======== |
3 |
|
4 |
This tutorial describes steps allowing to perform quantitave analysis of |
5 |
nucleosomal epigenome. We assume that files are organised around a given |
6 |
hierarchie and that all command lines are launched from project's root. |
7 |
|
8 |
This tutorial is divided into t=wo main parts. First one consists in the python |
9 |
script `wf.py` that aligns and convert Illumina reads. Second one is the R |
10 |
script `main.r` that extracts information (nucleosome position and indicators) |
11 |
from the dataset. |
12 |
|
13 |
|
14 |
Python and R Common Configuration File |
15 |
-------------------------------------- |
16 |
|
17 |
First of all we define in one place some configuration variables that will be launch by python and R scripts. This file is **configurator.py**. The execution of this python script dumps variables into the **nucleo_miner_config.json** that will be launch by both kind of scriopts (R and puython). |
18 |
|
19 |
To do this launch at the root of your project the following command line: |
20 |
|
21 |
.. code:: bash |
22 |
|
23 |
python src/current/configurator.py |
24 |
|
25 |
|
26 |
$$$ other python script to describe: |
27 |
- libcoverage.py |
28 |
- wf.py |
29 |
|
30 |
|
31 |
|
32 |
|
33 |
|
34 |
Dataset and Configuration Variables |
35 |
----------------------------------- |
36 |
|
37 |
We want to compare nucleosomes of 3 yeast strains: |
38 |
|
39 |
- BY |
40 |
- RM |
41 |
- YJM |
42 |
|
43 |
For each strain we perform Mnase-Seq and ChIP-Seq using the 5 following |
44 |
markers: |
45 |
|
46 |
- H3K4me1 |
47 |
- H3K4me3 |
48 |
- H3K9ac |
49 |
- H3K14ac |
50 |
- H4K12ac |
51 |
|
52 |
In order to simplify the design of experiment, we considere Mnase as a marker. |
53 |
For each couple `(strain, marker)` we perform 3 replicates. So, theoritically |
54 |
we should have `3 * (1 + 5) * 3 = 54` samples. In practice we only obtain 2 |
55 |
replicates for `(YJM, H3K4me1)`. Each one of the 53 samples is indentify by a |
56 |
uniq identifier. The file `CSV_SAMPLE_FILE` sums up this information. |
57 |
|
58 |
.. autodata:: configurator.CSV_SAMPLE_FILE |
59 |
:noindex: |
60 |
|
61 |
We use a convention to link sample and Illumina fastq outputs. Illumina output |
62 |
files of the sample `ID` will be stored in the directory |
63 |
`ILLUMINA_OUTPUTFILE_PREFIX` + `ID`. For example, sample 41 outputs will be |
64 |
stored in the directory `data/2012-09-05/FASTQ/Sample_Yvert_Bq41/`. |
65 |
|
66 |
.. autodata:: configurator.ILLUMINA_OUTPUTFILE_PREFIX |
67 |
:noindex: |
68 |
|
69 |
For BY (resp. RM and YJM) we use following reference genome |
70 |
`saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta` |
71 |
(resp. `saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta` and |
72 |
`saccharomyces_cerevisiae_YJM_789_screencontig.fasta`). |
73 |
The index `FASTA_REFERENCE_GENOME_FILES` stores this information. |
74 |
|
75 |
.. autodata:: configurator.FASTA_REFERENCE_GENOME_FILES |
76 |
:noindex: |
77 |
|
78 |
Each chromosome/contig is identify in the fasta file by an obscure identifier. |
79 |
For example, BY chromosome I is identify by `gi|144228165|ref|NC_001133.7|` when |
80 |
TemplateFilter is waiting for an integer. So, we translate it. The index |
81 |
`FASTA_INDEXES` stores this translation. |
82 |
|
83 |
.. autodata:: configurator.FASTA_INDEXES |
84 |
:noindex: |
85 |
|
86 |
From a pragamatical point of view we discard some part of the genome (repeated |
87 |
sequence etc...). The list of the black listed area is explicitely detailled in |
88 |
`AREA_BLACK_LIST`. |
89 |
|
90 |
.. autodata:: configurator.AREA_BLACK_LIST |
91 |
:noindex: |
92 |
|
93 |
For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use previously |
94 |
compute .c2c file `data/2012-03_primarydata/BY_RM_gxcomp.c2c` (resp. |
95 |
`BY_YJM_GComp_All.c2c` and `RM_YJM_gxcomp.c2c`). For more information about |
96 |
.c2c files, please read section 5 of the manual of `NucleoMiner`, the old |
97 |
version of `NucleoMiner2` |
98 |
(http://www.ens-lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf). |
99 |
|
100 |
.. autodata:: configurator.C2C_FILES |
101 |
:noindex: |
102 |
|
103 |
`nucleominer` uses specific directory to work in, these are described in |
104 |
`INDEX_DIR`, `ALIGN_DIR` and `LOG_DIR`. |
105 |
|
106 |
Finally, `nucleominer` use external ressources, the path to these resspources |
107 |
are describe in `BOWTIE_BUILD_BIN`, `BOWTIE2_BIN`, `SAMTOOLS_BIN`, |
108 |
`BEDTOOLS_BIN` and `TF_BIN` and `TF_TEMPLATES_FILE`. |
109 |
|
110 |
All paths, prefixes and indexes could be change in the |
111 |
`src/current/nucleominer_config.json` file. |
112 |
|
113 |
.. autodata:: wf.json_conf_file |
114 |
:noindex: |
115 |
|
116 |
|
117 |
Preprocessing Illumina Fastq Reads for Each Sample |
118 |
-------------------------------------------------- |
119 |
|
120 |
This preprocessing step consists in the 4 main steps embed in the `wf.py` and |
121 |
described bellow. As a preamble, this script computes `samples` `samples_mnase` |
122 |
and `strains` that will be used along the 4 steps. |
123 |
|
124 |
.. autodata:: wf.samples |
125 |
:noindex: |
126 |
|
127 |
.. autodata:: wf.samples_mnase |
128 |
:noindex: |
129 |
|
130 |
.. autodata:: wf.strains |
131 |
:noindex: |
132 |
|
133 |
|
134 |
Creating Bowtie Index from each Reference Genome |
135 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
136 |
|
137 |
For each strain, we need to create bowtie index. Bowtie index of a strain is a |
138 |
tree view of the genemoe reference for this strain. It will be used by |
139 |
bowtie to align reads. This step is performed by the following part of the |
140 |
`wf.py` script: |
141 |
|
142 |
.. literalinclude:: ../../../snep/src/current/wf.py |
143 |
:start-after: # _STARTOF_ step_1 |
144 |
:end-before: # _ENDOF_ step_1 |
145 |
:language: python |
146 |
|
147 |
The following table sum up involved file sizes and process durations concerning |
148 |
this step. |
149 |
|
150 |
====== ====================== ====================== ================ |
151 |
strain fasta genome file size bowtie index file size process duration |
152 |
====== ====================== ====================== ================ |
153 |
BY 12 Mo 25 Mo 11 s. |
154 |
RM 12 Mo 24 Mo 9 s. |
155 |
YJM 12 Mo 25 Mo 11 s. |
156 |
====== ====================== ====================== ================ |
157 |
|
158 |
|
159 |
|
160 |
|
161 |
Aligning Reads to Reference Genome |
162 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
163 |
|
164 |
Next, we launch bowtie to align reads to the reference genome. It produces a |
165 |
`.sam` file that we convert into a `.bed` file. Binaries for `bowtie`, `samtools` |
166 |
and `bedtools` are wrapped using python `subprocess` class. This step is |
167 |
performed by the followinw part of the `wf.py` script: |
168 |
|
169 |
.. literalinclude:: ../../../snep/src/current/wf.py |
170 |
:start-after: # _STARTOF_ step_2 |
171 |
:end-before: # _ENDOF_ step_2 |
172 |
:language: python |
173 |
|
174 |
Convert Aligned Reads for TemplateFilter |
175 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
176 |
TemplateFilter use particular input format for reads, so we convert `.bed` file. |
177 |
TemplateFilter expect reads as following: `chr coord strand #read` where: |
178 |
|
179 |
- chr is the number of the chromosome; |
180 |
- coord is the coordinate of the reads; |
181 |
- strand is `F` for forward and `R` for reverse; |
182 |
- #reads the number of reads for this position. |
183 |
|
184 |
Each entry is *tab*-separated. |
185 |
|
186 |
**WARNING** for reverse strand bowtie returns the position of left first |
187 |
nucleotid when TemplateFilter is waiting for right one. So this step takes it |
188 |
into account and add lenght of reads (in our case 50) to reverse reads |
189 |
coordinate. |
190 |
|
191 |
This step is performed by the followinw part of the `wf.py` script: |
192 |
|
193 |
.. literalinclude:: ../../../snep/src/current/wf.py |
194 |
:start-after: # _STARTOF_ step_3 |
195 |
:end-before: # _ENDOF_ step_3 |
196 |
:language: python |
197 |
|
198 |
The following table sum up number of reads, involved file sizes and process durations concerning |
199 |
the two last steps. In our case, aligment process have been multuthreaded over over 3 cores. |
200 |
|
201 |
== ============== ========================= ====== ================ ================== ================ |
202 |
id Illumina reads aligned and filtred reads ratio `.bed` file size TF input file size process duration |
203 |
== ============== ========================= ====== ================ ================== ================ |
204 |
1 16436138 10199695 62,06% 1064 Mo 60 Mo 383 s. |
205 |
2 16911132 12512727 73,99% 1298 Mo 64 Mo 437 s. |
206 |
3 15946902 12340426 77,38% 1280 Mo 65 Mo 423 s. |
207 |
4 13765584 10381903 75,42% 931 Mo 59 Mo 352 s. |
208 |
5 15168268 11502855 75,83% 1031 Mo 64 Mo 386 s. |
209 |
6 18850820 14024905 74,40% 1254 Mo 69 Mo 482 s. |
210 |
7 15591124 12126623 77,78% 1163 Mo 72 Mo 405 s. |
211 |
8 15659905 12475664 79,67% 1194 Mo 71 Mo 416 s. |
212 |
9 14668641 10960565 74,72% 1052 Mo 70 Mo 375 s. |
213 |
10 14339179 10454451 72,91% 1049 Mo 51 Mo 363 s. |
214 |
11 18019895 13688774 75,96% 1378 Mo 59 Mo 474 s. |
215 |
12 13746796 10810022 78,64% 1084 Mo 54 Mo 360 s. |
216 |
13 15205065 11766016 77,38% 990 Mo 54 Mo 381 s. |
217 |
14 17803097 13838883 77,73% 1154 Mo 60 Mo 452 s. |
218 |
15 15434564 12307878 79,74% 1032 Mo 57 Mo 394 s. |
219 |
16 16802587 12725665 75,74% 1221 Mo 48 Mo 438 s. |
220 |
17 16058417 12513734 77,93% 1192 Mo 63 Mo 422 s. |
221 |
18 16154482 13204331 81,74% 1277 Mo 52 Mo 430 s. |
222 |
19 21013924 17102120 81,38% 1646 Mo 59 Mo 555 s. |
223 |
20 17213114 14433357 83,85% 1389 Mo 53 Mo 459 s. |
224 |
21 17360907 14733001 84,86% 1203 Mo 55 Mo 450 s. |
225 |
22 18136816 15389581 84,85% 1257 Mo 53 Mo 469 s. |
226 |
23 14763678 12173025 82,45% 1140 Mo 56 Mo 393 s. |
227 |
24 15541709 12890345 82,94% 1057 Mo 48 Mo 398 s. |
228 |
25 16433215 13094314 79,68% 1241 Mo 57 Mo 433 s. |
229 |
26 17370850 14264136 82,12% 1347 Mo 51 Mo 466 s. |
230 |
27 14613512 8654495 59,22% 887 Mo 56 Mo 339 s. |
231 |
28 15248545 11367589 74,55% 1166 Mo 67 Mo 405 s. |
232 |
29 14316809 10767926 75,21% 1103 Mo 63 Mo 379 s. |
233 |
30 15178058 12265794 80,81% 1030 Mo 66 Mo 390 s. |
234 |
31 14968579 11876186 79,34% 1009 Mo 63 Mo 387 s. |
235 |
32 16912705 13550508 80,12% 1143 Mo 70 Mo 442 s. |
236 |
33 16782154 12755111 76,00% 1227 Mo 65 Mo 438 s. |
237 |
34 16741443 13168071 78,66% 1260 Mo 71 Mo 442 s. |
238 |
35 13096171 10367041 79,16% 992 Mo 62 Mo 350 s. |
239 |
36 17715118 14092985 79,55% 1404 Mo 68 Mo 483 s. |
240 |
37 17288466 7402082 42,82% 741 Mo 48 Mo 339 s. |
241 |
38 16116394 13178457 81,77% 1101 Mo 63 Mo 420 s. |
242 |
39 14241106 10537228 73,99% 880 Mo 57 Mo 348 s. |
243 |
40 13784738 10598464 76,89% 1005 Mo 64 Mo 358 s. |
244 |
41 12438007 9620975 77,35% 911 Mo 60 Mo 326 s. |
245 |
42 13853959 11031238 79,63% 1045 Mo 64 Mo 365 s. |
246 |
43 12036162 6654780 55,29% 684 Mo 46 Mo 268 s. |
247 |
44 13873129 10251074 73,89% 1048 Mo 61 Mo 365 s. |
248 |
45 19817751 14904502 75,21% 1520 Mo 72 Mo 528 s. |
249 |
46 13368959 10818619 80,92% 912 Mo 63 Mo 350 s. |
250 |
47 7566467 6139001 81,13% 520 Mo 44 Mo 201 s. |
251 |
48 32586928 21191363 65,03% 1816 Mo 82 Mo 766 s. |
252 |
49 30733184 18791373 61,14% 1801 Mo 89 Mo 721 s. |
253 |
50 41287616 30383875 73,59% 2911 Mo 112 Mo 1065 s. |
254 |
51 40439965 31177914 77,10% 2981 Mo 117 Mo 1070 s. |
255 |
53 40876476 33780065 82,64% 3316 Mo 103 Mo 1165 s. |
256 |
55 52424414 47117107 89,88% 3811 Mo 119 Mo 1477 s. |
257 |
== ============== ========================= ====== ================ ================== ================ |
258 |
|
259 |
For some reasons (manipulation efficiency, e.g. PCR...), we remove samples 33, 45, 48 and 55. |
260 |
|
261 |
|
262 |
Run TemplateFilter on Mnase Samples |
263 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
264 |
|
265 |
Finally, for each sample we perfome TemplateFilter analysis. |
266 |
|
267 |
**WARNING** TemplateFilter returns a list of nucleosomes. Each nucleosome is |
268 |
define by its center and its width. An odd width leads us to considere non |
269 |
interger lower and upper bound. |
270 |
|
271 |
**WARNING** TemplateFilter is not design to deal with replicate. So we choose to |
272 |
keep a maximum of nucleosome and filter in a second time using the benefit of |
273 |
replicate. To do that we set a low correlation threshold parameter (`0.5`) and a |
274 |
particularly high value of overlaping (`300%`). |
275 |
|
276 |
This step is performed by the followinw part of the `wf.py` script: |
277 |
|
278 |
.. literalinclude:: ../../../snep/src/current/wf.py |
279 |
:start-after: # _STARTOF_ step_4 |
280 |
:end-before: # _ENDOF_ step_4 |
281 |
:language: python |
282 |
|
283 |
== ====== ========== ============= ================ |
284 |
id strain found nucs nuc file size process duration |
285 |
== ====== ========== ============= ================ |
286 |
1 BY 96214 68 Mo 1022 s. |
287 |
2 BY 91694 65 Mo 1038 s. |
288 |
3 BY 91205 65 Mo 1036 s. |
289 |
4 RM 88076 62 Mo 984 s. |
290 |
5 RM 90141 64 Mo 967 s. |
291 |
6 RM 87517 62 Mo 980 s. |
292 |
7 YJM 88945 64 Mo 566 s. |
293 |
8 YJM 88689 64 Mo 570 s. |
294 |
9 YJM 88128 63 Mo 565 s. |
295 |
== ====== ========== ============= ================ |
296 |
|
297 |
|
298 |
|
299 |
|
300 |
|
301 |
|
302 |
|
303 |
|
304 |
|
305 |
|
306 |
|
307 |
|
308 |
|
309 |
Inferring Nucleosome Position and Extracting Read Counts |
310 |
-------------------------------------------------------- |
311 |
|
312 |
|
313 |
|
314 |
The second part of the tutorial uses `R` (http://http://www.r-project.org). It consists in a set of R scripts taht will be sourced in an R console launched at the root of your project. the R srcipts are: |
315 |
|
316 |
- headers.R |
317 |
- extract_maps.R |
318 |
- compare_common_wp.R |
319 |
- split_samples.R |
320 |
- count_reads.R |
321 |
- get_size_factors |
322 |
- launch_deseq.R |
323 |
|
324 |
The Script headers.R |
325 |
^^^^^^^^^^^^^^^^^^^^ |
326 |
|
327 |
The script header.R is included in each other scripts. It is in charge of: |
328 |
|
329 |
- launching libraries used in thes scripts |
330 |
- launching configuration (design, strain, marker...) |
331 |
- computing and caching CURs |
332 |
|
333 |
In your R console, run the following command line: |
334 |
|
335 |
.. code:: bash |
336 |
|
337 |
R CMD BATCH src/current/header.R |
338 |
|
339 |
|
340 |
The Script extract_maps.R |
341 |
^^^^^^^^^^^^^^^^^^^^^^^^^ |
342 |
This script is in charge of extracting Maps for well positioned and fuzzy nucleosomes. First of all, this script computed intra and inter strain nucleosome maps for each CUR. This step is executed in parallel on many cores using the BoT library. Next, it collects results and produces well positioned, fuzzy and UNR maps. |
343 |
|
344 |
The well-positioned map for BY is collected in the result directory and is called **BY_wp.tab**. It is composed of following columns: |
345 |
|
346 |
- chr, the number of the chromosome |
347 |
- lower_bound, the lower bound of the nucleosome |
348 |
- upper_bound, the upper bound of the nucleosome |
349 |
- cur_index, index of the CUR |
350 |
- index_nuc, the index of the nucleosome in the CUR |
351 |
- wp, 1 if it is a well positioned nucleosome, 0 else |
352 |
- nb_reads, the number of reads that supports this nucleosome |
353 |
- nb_nucs, the number of TemplateFilter nucleosome across replicates (= the number of replicates if it is a well-positioned nucleosome) |
354 |
- llr_1, for a well-positioned nucleosome, it is the LLR1 between the first and the second TemplateFilter nucleosome. |
355 |
- llr_2, for a well-positioned nucleosome, it is the LLR1 between the second and the first TemplateFilter nucleosome. |
356 |
- wp_llr, for a well-positioned nucleosome, it is the LLR2 overall TemplateFilter nucleosomes. |
357 |
- wp_pval, for a well-positioned nucleosome, it is the p-value chi square test obtained with the LLR2 (**1-pchisq(2.LLR2, df=4)**) |
358 |
- dyad_shift, for a well-positioned nucleosome, it is shift between the two extreme TemplateFilter nucleosome dyad positions. |
359 |
|
360 |
The fuzzy map for BY is collected in the result directory and is called **BY_fuzzy.tab**. It is composed of following columns: |
361 |
|
362 |
- chr, the number of the chromosome |
363 |
- lower_bound, the lower bound of the nucleosome |
364 |
- upper_bound, the upper bound of the nucleosome |
365 |
- cur_index, index of the CUR |
366 |
|
367 |
The common well-position map for BY and RM strains is collected in the result directory and is called **BY_RM_common_wp.tab**. It is composed of following columns: |
368 |
|
369 |
- cur_index, the index of the CUR |
370 |
- index_nuc_BY, the index of the BY nucleosome in the CUR |
371 |
- index_nuc_RM,the index of the RM nucleosome in the CUR |
372 |
- llr_score, the LLR3 score between th eBy and RM nucleosomes |
373 |
- common_wp_pval, the p-value chi square test obtained with the LLR3 (**1-pchisq(2.LLR3, df=2)**) |
374 |
|
375 |
The common UNR map for BY and RM strains is collected in the result directory and is called **BY_RM_common_unr.tab**. It is composed of following columns: |
376 |
|
377 |
- cur_index, the index of the CUR |
378 |
- index_nuc_BY, the index of the BY nucleosome in the CUR |
379 |
- index_nuc_RM,the index of the RM nucleosome in the CUR |
380 |
|
381 |
To execute this script, run the following command line in your R console: |
382 |
|
383 |
.. code:: bash |
384 |
|
385 |
source("src/current/extract_maps.R") |
386 |
|
387 |
|
388 |
The Script compare_common_wp.R |
389 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
390 |
|
391 |
This script is used to compare inter strain distances between common well-positioned nucleosomes. |
392 |
|
393 |
For example, it compute the file **BY_RM_common_wp_diff.tab** that contains dyad shifts between two well-positioned nucleosomes. It is composed of following columns: |
394 |
- cur_index, the index of the CUR |
395 |
- index_nuc_BY, the index of the BY nucleosome in the CUR |
396 |
- index_nuc_RM,the index of the RM nucleosome in the CUR |
397 |
- llr_score, the LLR3 score between th eBy and RM nucleosomes |
398 |
- common_wp_pval, the p-value chi square test obtained with the LLR3 (**1-pchisq(2.LLR3, df=2)**) |
399 |
- diff, the dyad shifts between two well-positioned nucleosomes |
400 |
|
401 |
It also translates well-positioned nucleosome maps from a strain to an other strain and stores it into a table. |
402 |
|
403 |
For example, the file **results/2014-04/RM_wp_tr_2_BY.tab** contains RM well-positioned nucleosome translated into the BY genome referential. It is composed of following columns: |
404 |
|
405 |
- strain_ref, the reference genome (in which positioned are defined) |
406 |
- begin, the translated lower bound of the nucleosome |
407 |
- end, the translated upper bound of the nucleosome |
408 |
- chr, the number of chromosome for the reference genome (in which positioned are defined) |
409 |
- length, the length of the nucleosome (could be negative) |
410 |
- cur_index, the index of the CUR |
411 |
- index_nuc, the index of the nucleosome in the CUR |
412 |
|
413 |
|
414 |
|
415 |
To execute this script, run the following command line in your R console: |
416 |
|
417 |
.. code:: bash |
418 |
|
419 |
R CMD BATCH src/current/compare_common_wp.R |
420 |
|
421 |
|
422 |
Split and Compress Samples According CURs |
423 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
424 |
|
425 |
.. code:: bash |
426 |
|
427 |
R CMD BATCH src/current/split_samples.R |
428 |
|
429 |
|
430 |
Count Reads for Each Nucleosome |
431 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
432 |
|
433 |
.. code:: bash |
434 |
|
435 |
R CMD BATCH src/current/count_reads.R |
436 |
|
437 |
|
438 |
Get Size Factors Using DESeq |
439 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
440 |
|
441 |
.. code:: bash |
442 |
|
443 |
R CMD BATCH src/current/get_size_factors.R |
444 |
|
445 |
|
446 |
Performing DESeq Analysis |
447 |
^^^^^^^^^^^^^^^^^^^^^^^^^ |
448 |
|
449 |
.. code:: bash |
450 |
|
451 |
R CMD BATCH src/current/launch_deseq.R |
452 |
|
453 |
|
454 |
Results |
455 |
------- |
456 |
|
457 |
Output Files Organisation |
458 |
^^^^^^^^^^^^^^^^^^^^^^^^^ |
459 |
Previous steps produce following 45 files. |
460 |
Each filename is under the form |
461 |
|
462 |
.. code:: bash |
463 |
|
464 |
results/current/[combi]_[marker]_[form]_snep.tab |
465 |
|
466 |
Where combi is in {BY_RM, BY_YJM, RM_YJM} for each strain combination, marker is |
467 |
in {H3K4me1, H3K4me3, H3K9ac, H3K14ac, H4K12ac} for each post translational |
468 |
histone modification and form is in {wp, fuzzy, wpfuzzy} considering well |
469 |
positioned nucleosomes, fuzzy nucleosomes or both for SNEP computation. |
470 |
|
471 |
|
472 |
|
473 |
chr_BY lower_bound_BY upper_bound_BY index_nuc_BY chr_RM lower_bound_RM upper_bound_RM index_nuc_RM roi_index form BY_Mnase_Seq_1 BY_Mnase_Seq_2 BY_Mnase_Seq_3 RM_Mnase_Seq_4 RM_Mnase_Seq_5 RM_Mnase_Seq_6 BY_H3K14ac_36 |
474 |
BY_H3K14ac_37 BY_H3K14ac_53 RM_H3K14ac_38 RM_H3K14ac_39 pvalsGLM |
475 |
|
476 |
For each file, there is 1 line per nucleosome and each line is composed of many columns divided into 3 main topics: |
477 |
- nuc information |
478 |
- number opf reads for each sample |
479 |
- DESeq analysis results. |
480 |
|
481 |
For exemple for the file *BY_RM_H3K14ac_wp_snep.tab* informations are: |
482 |
- chr_BY, the BY chr involved |
483 |
- lower_bound_BY, the lower bound of the BY nuc |
484 |
- upper_bound_BY, the upper_bound of the BY nuc |
485 |
- index_nuc_BY, the index of the nuc in the entire list of BY nucs |
486 |
- chr_RM, lower_bound_RM, upper_bound_RM, index_nuc_RM |
487 |
are the same information for the RM strain |
488 |
- roi_index, the index of the region of interrest involved. |
489 |
|
490 |
Next cols concern indicators for each sample. They are labeled [strain]_[marker]_[sample_id] and each value represents the number of reads for the current nuc for the sample *sample_id*. |
491 |
|
492 |
The 5 final columns concern DESeq analysis: |
493 |
- manip[a_manip] strain[a_strain] manip[a_strain]:strain[a_strain], the manip (marker) effect, the strain effect and the snep effect. |
494 |
- pvalsGLM, the pvalue resulting of the comparison of the GLM model considering or the interaction term *marker:strain* |
495 |
- snep_index, a boolean set to TRUE if the *pvalueGLM* value is under the threshold computed with FDR function with a rate set to 0.01%. |
496 |
|
497 |
It also produces the file that explicts size factor for each involved sample in differents strain combination and nucleosomal region type: |
498 |
|
499 |
TODO: include this file... /home/filleton/analyses/snepcatalog/data/2013-10-09/current/README.txt |
500 |
|
501 |
|
502 |
.. code:: bash |
503 |
|
504 |
results/current/size_factors.tab |
505 |
|
506 |
|
507 |
|
508 |
|
509 |
Number of SNEPs |
510 |
^^^^^^^^^^^^^^^ |
511 |
|
512 |
Here are the number of computed for each forms. |
513 |
|
514 |
.. code:: bash |
515 |
|
516 |
[1] "wp" |
517 |
#nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
518 |
BY-RM 30234 520 798 83 3566 26 |
519 |
BY-YJM 31298 303 619 102 103 128 |
520 |
RM-YJM 29863 129 340 46 3177 18 |
521 |
[1] "fuzzy" |
522 |
#nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
523 |
BY-RM 10748 294 308 101 1681 42 |
524 |
BY-YJM 10669 122 176 124 93 87 |
525 |
RM-YJM 11478 54 112 41 1389 20 |
526 |
[1] "wpfuzzy" |
527 |
#nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
528 |
BY-RM 40982 770 1136 183 5404 73 |
529 |
BY-YJM 41967 439 804 214 198 199 |
530 |
RM-YJM 41341 184 468 87 4687 37 |
531 |
|
532 |
|
533 |
TODO: |
534 |
- Print/study intra/inter strain LODs. |
535 |
- Check the normality of sample using Shapiro–Wilk (Hypothesis for computing LODs) |
536 |
|
537 |
|