root / doc / sphinx_doc / build / text / tuto.txt @ 21b8928f
Historique | Voir | Annoter | Télécharger (25,18 ko)
1 |
|
---|---|
2 |
Tutorial |
3 |
******** |
4 |
|
5 |
This tutorial describes steps allowing to perform quantitave analysis |
6 |
of nucleosomal epigenome. We assume that files are organised around a |
7 |
given hierarchie and that all command lines are launched from |
8 |
project's root. |
9 |
|
10 |
This tutorial is divided into t=wo main parts. First one consists in |
11 |
the python script *wf.py* that aligns and convert Illumina reads. |
12 |
Second one is the R script *main.r* that extracts information |
13 |
(nucleosome position and indicators) from the dataset. |
14 |
|
15 |
|
16 |
Dataset and Configuration File |
17 |
============================== |
18 |
|
19 |
We want to compare nucleosomes of 3 yeast strains: |
20 |
|
21 |
* BY |
22 |
|
23 |
* RM |
24 |
|
25 |
* YJM |
26 |
|
27 |
For each strain we perform Mnase-Seq and ChIP-Seq using the 5 |
28 |
following markers: |
29 |
|
30 |
* H3K4me1 |
31 |
|
32 |
* H3K4me3 |
33 |
|
34 |
* H3K9ac |
35 |
|
36 |
* H3K14ac |
37 |
|
38 |
* H4K12ac |
39 |
|
40 |
In order to simplify the design of exeriment, we considere Mnase as a |
41 |
marker. For each couple *(strain, marker)* we perform 3 replicates. |
42 |
So, theoritically we should have *3 * (1 + 5) * 3 = 54* samples. In |
43 |
practice we only obtain 2 replicates for *(YJM, H3K4me1)*. Each one of |
44 |
the 53 samples is indentify by a uniq identifier. The file |
45 |
*CSV_SAMPLE_FILE* sums up this information. |
46 |
|
47 |
We use a convention to link sample and Illumina fastq outputs. |
48 |
Illumina output files of the sample *ID* will be stored in the |
49 |
directory *ILLUMINA_OUTPUTFILE_PREFIX* + *ID*. For example, sample 41 |
50 |
outputs will be stored in the directory |
51 |
*data/2012-09-05/FASTQ/Sample_Yvert_Bq41/*. |
52 |
|
53 |
For BY (resp. RM and YJM) we use following reference genome |
54 |
*saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta* (resp. |
55 |
*saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta* and |
56 |
*saccharomyces_cerevisiae_YJM_789_screencontig.fasta*). The index |
57 |
*FASTA_REFERENCE_GENOME_FILES* stores this information. |
58 |
|
59 |
Each chromosome/contig is identify in the fasta file by an obscure |
60 |
identifier. For example, BY chromosome I is identify by |
61 |
*gi|144228165|ref|NC_001133.7|* when TemplateFilter is waiting for an |
62 |
integer. So, we translate it. The index *FASTA_INDEXES* stores this |
63 |
translation. |
64 |
|
65 |
From a pragamatical point of view we discard some part of the genome |
66 |
(repeated sequence etc...). The list of the black listed area is |
67 |
explicitely detailled in *AREA_BLACK_LIST*. |
68 |
|
69 |
For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use |
70 |
previously compute .c2c file |
71 |
*data/2012-03_primarydata/BY_RM_gxcomp.c2c* (resp. |
72 |
*BY_YJM_GComp_All.c2c* and *RM_YJM_gxcomp.c2c*). For more information |
73 |
about .c2c files, please read section 5 of the manual of |
74 |
*NucleoMiner*, the old version of *NucleoMiner2* (http://www.ens- |
75 |
lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf). |
76 |
|
77 |
*nucleominer* uses specific directory to work in, these are described |
78 |
in *INDEX_DIR*, *ALIGN_DIR* and *LOG_DIR*. |
79 |
|
80 |
Finally, *nucleominer* use external ressources, the path to these |
81 |
resspources are describe in *BOWTIE_BUILD_BIN*, *BOWTIE2_BIN*, |
82 |
*SAMTOOLS_BIN*, *BEDTOOLS_BIN* and *TF_BIN* and *TF_TEMPLATES_FILE*. |
83 |
|
84 |
All paths, prefixes and indexes could be change in the |
85 |
*src/current/nucleominer_config.json* file. |
86 |
|
87 |
|
88 |
Preprocessing Illumina Fastq Reads for Each Sample |
89 |
================================================== |
90 |
|
91 |
This preprocessing step consists in the 4 main steps embed in the |
92 |
*wf.py* and described bellow. As a preamble, this script computes |
93 |
*samples* *samples_mnase* and *strains* that will be used along the 4 |
94 |
steps. |
95 |
|
96 |
|
97 |
Creating Bowtie Index from each Reference Genome |
98 |
------------------------------------------------ |
99 |
|
100 |
For each strain, we need to create bowtie index. Bowtie index of a |
101 |
strain is a tree view of the genemoe reference for this strain. It |
102 |
will be used by bowtie to align reads. This step is performed by the |
103 |
following part of the *wf.py* script: |
104 |
|
105 |
The following table sum up involved file sizes and process durations |
106 |
concerning this step. |
107 |
|
108 |
+--------+------------------------+------------------------+------------------+ |
109 |
| strain | fasta genome file size | bowtie index file size | process duration | |
110 |
+========+========================+========================+==================+ |
111 |
| BY | 12 Mo | 25 Mo | 11 s. | |
112 |
+--------+------------------------+------------------------+------------------+ |
113 |
| RM | 12 Mo | 24 Mo | 9 s. | |
114 |
+--------+------------------------+------------------------+------------------+ |
115 |
| YJM | 12 Mo | 25 Mo | 11 s. | |
116 |
+--------+------------------------+------------------------+------------------+ |
117 |
|
118 |
|
119 |
Aligning Reads to Reference Genome |
120 |
---------------------------------- |
121 |
|
122 |
Next, we launch bowtie to align reads to the reference genome. It |
123 |
produces a *.sam* file that we convert into a *.bed* file. Binaries |
124 |
for *bowtie*, *samtools* and *bedtools* are wrapped using python |
125 |
*subprocess* class. This step is performed by the followinw part of |
126 |
the *wf.py* script: |
127 |
|
128 |
|
129 |
Convert Aligned Reads for TemplateFilter |
130 |
---------------------------------------- |
131 |
|
132 |
TemplateFilter use particular input format for reads, so we convert |
133 |
*.bed* file. TemplateFilter expect reads as following: *chr coord |
134 |
strand #read* where: |
135 |
|
136 |
* chr is the number of the chromosome; |
137 |
|
138 |
* coord is the coordinate of the reads; |
139 |
|
140 |
* strand is *F* for forward and *R* for reverse; |
141 |
|
142 |
* #reads the number of reads for this position. |
143 |
|
144 |
Each entry is *tab*-separated. |
145 |
|
146 |
**WARNING** for reverse strand bowtie returns the position of left |
147 |
first nucleotid when TemplateFilter is waiting for right one. So this |
148 |
step takes it into account and add lenght of reads (in our case 50) to |
149 |
reverse reads coordinate. |
150 |
|
151 |
This step is performed by the followinw part of the *wf.py* script: |
152 |
|
153 |
The following table sum up number of reads, involved file sizes and |
154 |
process durations concerning the two last steps. In our case, aligment |
155 |
process have been multuthreaded over over 3 cores. |
156 |
|
157 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
158 |
| id | Illumina reads | aligned and filtred reads | ratio | *.bed* file size | TF input file size | process duration | |
159 |
+====+================+===========================+========+==================+====================+==================+ |
160 |
| 1 | 16436138 | 10199695 | 62,06% | 1064 Mo | 60 Mo | 383 s. | |
161 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
162 |
| 2 | 16911132 | 12512727 | 73,99% | 1298 Mo | 64 Mo | 437 s. | |
163 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
164 |
| 3 | 15946902 | 12340426 | 77,38% | 1280 Mo | 65 Mo | 423 s. | |
165 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
166 |
| 4 | 13765584 | 10381903 | 75,42% | 931 Mo | 59 Mo | 352 s. | |
167 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
168 |
| 5 | 15168268 | 11502855 | 75,83% | 1031 Mo | 64 Mo | 386 s. | |
169 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
170 |
| 6 | 18850820 | 14024905 | 74,40% | 1254 Mo | 69 Mo | 482 s. | |
171 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
172 |
| 7 | 15591124 | 12126623 | 77,78% | 1163 Mo | 72 Mo | 405 s. | |
173 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
174 |
| 8 | 15659905 | 12475664 | 79,67% | 1194 Mo | 71 Mo | 416 s. | |
175 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
176 |
| 9 | 14668641 | 10960565 | 74,72% | 1052 Mo | 70 Mo | 375 s. | |
177 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
178 |
| 10 | 14339179 | 10454451 | 72,91% | 1049 Mo | 51 Mo | 363 s. | |
179 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
180 |
| 11 | 18019895 | 13688774 | 75,96% | 1378 Mo | 59 Mo | 474 s. | |
181 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
182 |
| 12 | 13746796 | 10810022 | 78,64% | 1084 Mo | 54 Mo | 360 s. | |
183 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
184 |
| 13 | 15205065 | 11766016 | 77,38% | 990 Mo | 54 Mo | 381 s. | |
185 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
186 |
| 14 | 17803097 | 13838883 | 77,73% | 1154 Mo | 60 Mo | 452 s. | |
187 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
188 |
| 15 | 15434564 | 12307878 | 79,74% | 1032 Mo | 57 Mo | 394 s. | |
189 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
190 |
| 16 | 16802587 | 12725665 | 75,74% | 1221 Mo | 48 Mo | 438 s. | |
191 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
192 |
| 17 | 16058417 | 12513734 | 77,93% | 1192 Mo | 63 Mo | 422 s. | |
193 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
194 |
| 18 | 16154482 | 13204331 | 81,74% | 1277 Mo | 52 Mo | 430 s. | |
195 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
196 |
| 19 | 21013924 | 17102120 | 81,38% | 1646 Mo | 59 Mo | 555 s. | |
197 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
198 |
| 20 | 17213114 | 14433357 | 83,85% | 1389 Mo | 53 Mo | 459 s. | |
199 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
200 |
| 21 | 17360907 | 14733001 | 84,86% | 1203 Mo | 55 Mo | 450 s. | |
201 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
202 |
| 22 | 18136816 | 15389581 | 84,85% | 1257 Mo | 53 Mo | 469 s. | |
203 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
204 |
| 23 | 14763678 | 12173025 | 82,45% | 1140 Mo | 56 Mo | 393 s. | |
205 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
206 |
| 24 | 15541709 | 12890345 | 82,94% | 1057 Mo | 48 Mo | 398 s. | |
207 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
208 |
| 25 | 16433215 | 13094314 | 79,68% | 1241 Mo | 57 Mo | 433 s. | |
209 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
210 |
| 26 | 17370850 | 14264136 | 82,12% | 1347 Mo | 51 Mo | 466 s. | |
211 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
212 |
| 27 | 14613512 | 8654495 | 59,22% | 887 Mo | 56 Mo | 339 s. | |
213 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
214 |
| 28 | 15248545 | 11367589 | 74,55% | 1166 Mo | 67 Mo | 405 s. | |
215 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
216 |
| 29 | 14316809 | 10767926 | 75,21% | 1103 Mo | 63 Mo | 379 s. | |
217 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
218 |
| 30 | 15178058 | 12265794 | 80,81% | 1030 Mo | 66 Mo | 390 s. | |
219 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
220 |
| 31 | 14968579 | 11876186 | 79,34% | 1009 Mo | 63 Mo | 387 s. | |
221 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
222 |
| 32 | 16912705 | 13550508 | 80,12% | 1143 Mo | 70 Mo | 442 s. | |
223 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
224 |
| 33 | 16782154 | 12755111 | 76,00% | 1227 Mo | 65 Mo | 438 s. | |
225 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
226 |
| 34 | 16741443 | 13168071 | 78,66% | 1260 Mo | 71 Mo | 442 s. | |
227 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
228 |
| 35 | 13096171 | 10367041 | 79,16% | 992 Mo | 62 Mo | 350 s. | |
229 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
230 |
| 36 | 17715118 | 14092985 | 79,55% | 1404 Mo | 68 Mo | 483 s. | |
231 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
232 |
| 37 | 17288466 | 7402082 | 42,82% | 741 Mo | 48 Mo | 339 s. | |
233 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
234 |
| 38 | 16116394 | 13178457 | 81,77% | 1101 Mo | 63 Mo | 420 s. | |
235 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
236 |
| 39 | 14241106 | 10537228 | 73,99% | 880 Mo | 57 Mo | 348 s. | |
237 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
238 |
| 40 | 13784738 | 10598464 | 76,89% | 1005 Mo | 64 Mo | 358 s. | |
239 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
240 |
| 41 | 12438007 | 9620975 | 77,35% | 911 Mo | 60 Mo | 326 s. | |
241 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
242 |
| 42 | 13853959 | 11031238 | 79,63% | 1045 Mo | 64 Mo | 365 s. | |
243 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
244 |
| 43 | 12036162 | 6654780 | 55,29% | 684 Mo | 46 Mo | 268 s. | |
245 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
246 |
| 44 | 13873129 | 10251074 | 73,89% | 1048 Mo | 61 Mo | 365 s. | |
247 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
248 |
| 45 | 19817751 | 14904502 | 75,21% | 1520 Mo | 72 Mo | 528 s. | |
249 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
250 |
| 46 | 13368959 | 10818619 | 80,92% | 912 Mo | 63 Mo | 350 s. | |
251 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
252 |
| 47 | 7566467 | 6139001 | 81,13% | 520 Mo | 44 Mo | 201 s. | |
253 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
254 |
| 48 | 32586928 | 21191363 | 65,03% | 1816 Mo | 82 Mo | 766 s. | |
255 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
256 |
| 49 | 30733184 | 18791373 | 61,14% | 1801 Mo | 89 Mo | 721 s. | |
257 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
258 |
| 50 | 41287616 | 30383875 | 73,59% | 2911 Mo | 112 Mo | 1065 s. | |
259 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
260 |
| 51 | 40439965 | 31177914 | 77,10% | 2981 Mo | 117 Mo | 1070 s. | |
261 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
262 |
| 53 | 40876476 | 33780065 | 82,64% | 3316 Mo | 103 Mo | 1165 s. | |
263 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
264 |
| 55 | 52424414 | 47117107 | 89,88% | 3811 Mo | 119 Mo | 1477 s. | |
265 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
266 |
|
267 |
For some reasons (manipulation efficency, e.g. PCR...), we remove |
268 |
samples 33, 45, 48 and 55. |
269 |
|
270 |
|
271 |
Run TemplateFilter on Mnase Samples |
272 |
----------------------------------- |
273 |
|
274 |
Finally, for each sample we perfome TemplateFilter analysis. |
275 |
|
276 |
**WARNING** TemplateFilter returns a list of nucleosomes. Each |
277 |
nucleosome is define by its center and its width. An odd width leads |
278 |
us to considere non interger lower and upper bound. |
279 |
|
280 |
**WARNING** TemplateFilter is not design to deal with replicate. So we |
281 |
choose to keep a maximum of nucleosome and filter in a second time |
282 |
using the benefit of replicate. To do that we set a low correlation |
283 |
threshold parameter (*0.5*) and a particularly high value of |
284 |
overlaping (*300%*). |
285 |
|
286 |
This step is performed by the followinw part of the *wf.py* script: |
287 |
|
288 |
+----+--------+------------+---------------+------------------+ |
289 |
| id | strain | found nucs | nuc file size | process duration | |
290 |
+====+========+============+===============+==================+ |
291 |
| 1 | BY | 96214 | 68 Mo | 1022 s. | |
292 |
+----+--------+------------+---------------+------------------+ |
293 |
| 2 | BY | 91694 | 65 Mo | 1038 s. | |
294 |
+----+--------+------------+---------------+------------------+ |
295 |
| 3 | BY | 91205 | 65 Mo | 1036 s. | |
296 |
+----+--------+------------+---------------+------------------+ |
297 |
| 4 | RM | 88076 | 62 Mo | 984 s. | |
298 |
+----+--------+------------+---------------+------------------+ |
299 |
| 5 | RM | 90141 | 64 Mo | 967 s. | |
300 |
+----+--------+------------+---------------+------------------+ |
301 |
| 6 | RM | 87517 | 62 Mo | 980 s. | |
302 |
+----+--------+------------+---------------+------------------+ |
303 |
| 7 | YJM | 88945 | 64 Mo | 566 s. | |
304 |
+----+--------+------------+---------------+------------------+ |
305 |
| 8 | YJM | 88689 | 64 Mo | 570 s. | |
306 |
+----+--------+------------+---------------+------------------+ |
307 |
| 9 | YJM | 88128 | 63 Mo | 565 s. | |
308 |
+----+--------+------------+---------------+------------------+ |
309 |
|
310 |
|
311 |
Inferring Nucleosome Position and Extracting Read Counts |
312 |
======================================================== |
313 |
|
314 |
This preprocessing step consists in the 4 main steps embed in the |
315 |
*wf.py* and described bellow. As a preamble, this script computes |
316 |
*samples* *samples_mnase* and *strains* that will be used along the 4 |
317 |
steps. |
318 |
|
319 |
The second part of the tutoriel use *R* |
320 |
(http://http://www.r-project.org). It consists in the following main |
321 |
steps: |
322 |
|
323 |
* compute_rois.R |
324 |
|
325 |
* extract_maps.R |
326 |
|
327 |
* compare_common_wp.R |
328 |
|
329 |
* split_samples.R |
330 |
|
331 |
* count_reads.R |
332 |
|
333 |
* get_size_factors |
334 |
|
335 |
* launch_deseq.R |
336 |
|
337 |
|
338 |
Computing Common Genome Region Between Strains |
339 |
---------------------------------------------- |
340 |
|
341 |
R CMD BATCH src/current/compute_rois.R |
342 |
|
343 |
|
344 |
Extracting Maps for Well Positionned and Fuzzy Nucleosomes |
345 |
---------------------------------------------------------- |
346 |
|
347 |
R CMD BATCH src/current/extract_maps.R |
348 |
|
349 |
|
350 |
Compute Distance Between Well Positionned Nucleosomes |
351 |
----------------------------------------------------- |
352 |
|
353 |
R CMD BATCH src/current/compare_common_wp.R |
354 |
|
355 |
|
356 |
Split and Compress Samples According CURs |
357 |
----------------------------------------- |
358 |
|
359 |
R CMD BATCH src/current/split_samples.R |
360 |
|
361 |
|
362 |
Count Reads for Each Nucleosome |
363 |
------------------------------- |
364 |
|
365 |
R CMD BATCH src/current/count_reads.R |
366 |
|
367 |
|
368 |
Get Size Factors Using DESeq |
369 |
---------------------------- |
370 |
|
371 |
R CMD BATCH src/current/get_size_factors.R |
372 |
|
373 |
|
374 |
Performing DESeq Analysis |
375 |
------------------------- |
376 |
|
377 |
R CMD BATCH src/current/launch_deseq.R |
378 |
|
379 |
|
380 |
Results |
381 |
======= |
382 |
|
383 |
|
384 |
Output Files Organisation |
385 |
------------------------- |
386 |
|
387 |
Previous steps produce following 45 files. Each filename is under the |
388 |
form |
389 |
|
390 |
results/current/[combi]_[marker]_[form]_snep.tab |
391 |
|
392 |
Where combi is in {BY_RM, BY_YJM, RM_YJM} for each strain combination, |
393 |
marker is in {H3K4me1, H3K4me3, H3K9ac, H3K14ac, H4K12ac} for each |
394 |
post translational histone modification and form is in {wp, fuzzy, |
395 |
wpfuzzy} considering well positionned nucleosomes, fuzzy nucleosomes |
396 |
or both for SNEP computation. |
397 |
|
398 |
chr_BY lower_bound_BY upper_bound_BY index_nuc_BY chr_RM |
399 |
lower_bound_RM upper_bound_RM index_nuc_RM roi_index form |
400 |
BY_Mnase_Seq_1 BY_Mnase_Seq_2 BY_Mnase_Seq_3 RM_Mnase_Seq_4 |
401 |
RM_Mnase_Seq_5 RM_Mnase_Seq_6 BY_H3K14ac_36 BY_H3K14ac_37 |
402 |
BY_H3K14ac_53 RM_H3K14ac_38 RM_H3K14ac_39 pvalsGLM |
403 |
|
404 |
For each file, there is 1 line per nucleosome and each line is |
405 |
composed of many columns divided into 3 main topics: |
406 |
* nuc information |
407 |
|
408 |
* number opf reads for each sample |
409 |
|
410 |
* DESeq analysis results. |
411 |
|
412 |
For exemple for the file *BY_RM_H3K14ac_wp_snep.tab* informations are: |
413 |
* chr_BY, the BY chr involved |
414 |
|
415 |
* lower_bound_BY, the lower bound of the BY nuc |
416 |
|
417 |
* upper_bound_BY, the upper_bound of the BY nuc |
418 |
|
419 |
* index_nuc_BY, the index of the nuc in the entire list of BY |
420 |
nucs |
421 |
|
422 |
* chr_RM, lower_bound_RM, upper_bound_RM, index_nuc_RM |
423 |
|
424 |
are the same information for the RM strain |
425 |
|
426 |
* roi_index, the index of the region of interrest involved. |
427 |
|
428 |
Next cols concern indicators for each sample. They are labeled |
429 |
[strain]_[marker]_[sample_id] and each value represents the number of |
430 |
reads for the current nuc for the sample *sample_id*. |
431 |
|
432 |
The 5 final columns concern DESeq analysis: |
433 |
* manip[a_manip] strain[a_strain] |
434 |
manip[a_strain]:strain[a_strain], the manip (marker) effect, the |
435 |
strain effect and the snep effect. |
436 |
|
437 |
* pvalsGLM, the pvalue resulting of the comparison of the GLM |
438 |
model considering or the interaction term *marker:strain* |
439 |
|
440 |
* snep_index, a boolean set to TRUE if the *pvalueGLM* value is |
441 |
under the threshold computed with FDR function with a rate set to |
442 |
0.01%. |
443 |
|
444 |
It also produces the file that explicts size factor for each involved |
445 |
sample in differents strain combination and nucleosomal region type: |
446 |
|
447 |
TODO: include this file... |
448 |
/home/filleton/analyses/snepcatalog/data/2013-10-09/current/README.txt |
449 |
|
450 |
results/current/size_factors.tab |
451 |
|
452 |
|
453 |
Number of SNEPs |
454 |
--------------- |
455 |
|
456 |
Here are the number of computed for each forms. |
457 |
|
458 |
[1] "wp" |
459 |
#nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
460 |
BY-RM 30234 520 798 83 3566 26 |
461 |
BY-YJM 31298 303 619 102 103 128 |
462 |
RM-YJM 29863 129 340 46 3177 18 |
463 |
[1] "fuzzy" |
464 |
#nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
465 |
BY-RM 10748 294 308 101 1681 42 |
466 |
BY-YJM 10669 122 176 124 93 87 |
467 |
RM-YJM 11478 54 112 41 1389 20 |
468 |
[1] "wpfuzzy" |
469 |
#nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
470 |
BY-RM 40982 770 1136 183 5404 73 |
471 |
BY-YJM 41967 439 804 214 198 199 |
472 |
RM-YJM 41341 184 468 87 4687 37 |
473 |
|
474 |
TODO: |
475 |
* Print/study intra/inter strain LODs. |
476 |
|
477 |
* Check the normality of sample using Shapiro–Wilk (Hypothesis |
478 |
for computing LODs) |