Statistiques
| Branche: | Révision :

root / doc / sphinx_doc / build / text / tuto.txt @ b20637ed

Historique | Voir | Annoter | Télécharger (27,3 ko)

1

    
2
Tutorial
3
********
4

    
5
This tutorial describes steps allowing to perform quantitave analysis
6
of nucleosomal epigenome. We assume that files are organised around a
7
given hierarchie and that all command lines are launched from
8
project's root.
9

    
10
This tutorial is divided into t=wo main parts. First one consists in
11
the python script *wf.py* that aligns and convert Illumina reads.
12
Second one is the R script *main.r* that extracts information
13
(nucleosome position and indicators) from the dataset.
14

    
15

    
16
Dataset and Configuration File
17
==============================
18

    
19
We want to compare nucleosomes of 3 yeast strains:
20

    
21
* BY
22

    
23
* RM
24

    
25
* YJM
26

    
27
For each strain we perform Mnase-Seq and ChIP-Seq using the 5
28
following markers:
29

    
30
* H3K4me1
31

    
32
* H3K4me3
33

    
34
* H3K9ac
35

    
36
* H3K14ac
37

    
38
* H4K12ac
39

    
40
In order to simplify the design of exeriment, we considere Mnase as a
41
marker. For each couple *(strain, marker)* we perform 3 replicates.
42
So, theoritically we should have *3 * (1 + 5) * 3 = 54* samples. In
43
practice we only obtain 2 replicates for *(YJM, H3K4me1)*. Each one of
44
the 53 samples is indentify by a uniq identifier. The file
45
*CSV_SAMPLE_FILE* sums up this information.
46

    
47
configurator.CSV_SAMPLE_FILE = None
48

    
49
   Path to cvs file that contains sample information.
50

    
51
We use a convention to link sample and Illumina fastq outputs.
52
Illumina output files of the sample *ID* will be stored in the
53
directory *ILLUMINA_OUTPUTFILE_PREFIX* + *ID*. For example, sample 41
54
outputs will be stored in the directory
55
*data/2012-09-05/FASTQ/Sample_Yvert_Bq41/*.
56

    
57
configurator.ILLUMINA_OUTPUTFILE_PREFIX = None
58

    
59
   Prefix for Illumina fastq output files.
60

    
61
For BY (resp. RM and YJM) we use following reference genome
62
*saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta* (resp.
63
*saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta* and
64
*saccharomyces_cerevisiae_YJM_789_screencontig.fasta*). The index
65
*FASTA_REFERENCE_GENOME_FILES* stores this information.
66

    
67
configurator.FASTA_REFERENCE_GENOME_FILES = None
68

    
69
   Dictionary where each fasta reference genomes is indexed by
70
   reference strain that it corresponds.
71

    
72
Each chromosome/contig is identify in the fasta file by an obscure
73
identifier. For example, BY chromosome I is identify by
74
*gi|144228165|ref|NC_001133.7|* when TemplateFilter is waiting for an
75
integer. So, we translate it. The index *FASTA_INDEXES* stores this
76
translation.
77

    
78
configurator.FASTA_INDEXES = None
79

    
80
   Dictionary of strain that indexes dictionaries where keys are
81
   chromosome reference from Fastq file and value are its
82
   correspondance for Templatefilter.
83

    
84
From a pragamatical point of view we discard some part of the genome
85
(repeated sequence etc...). The list of the black listed area is
86
explicitely detailled in *AREA_BLACK_LIST*.
87

    
88
configurator.AREA_BLACK_LIST = None
89

    
90
   Dictionary where keys are strain and values are black listed of
91
   geneome region.
92

    
93
For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use
94
previously compute .c2c file
95
*data/2012-03_primarydata/BY_RM_gxcomp.c2c* (resp.
96
*BY_YJM_GComp_All.c2c* and *RM_YJM_gxcomp.c2c*). For more information
97
about .c2c files, please read section 5 of the manual of
98
*NucleoMiner*, the old version of *NucleoMiner2* (http://www.ens-
99
lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf).
100

    
101
configurator.C2C_FILES = None
102

    
103
   Dictionary where each strain combination indexes genome aligment.
104

    
105
*nucleominer* uses specific directory to work in, these are described
106
in *INDEX_DIR*, *ALIGN_DIR* and *LOG_DIR*.
107

    
108
Finally, *nucleominer* use external ressources, the path to these
109
resspources are describe in *BOWTIE_BUILD_BIN*, *BOWTIE2_BIN*,
110
*SAMTOOLS_BIN*, *BEDTOOLS_BIN* and *TF_BIN* and *TF_TEMPLATES_FILE*.
111

    
112
All paths, prefixes and indexes could be change in the
113
*src/current/nucleominer_config.json* file.
114

    
115
wf.json_conf_file = 'src/nucleo_miner/nucleo_miner_config.json'
116

    
117
   Path to the json configuration file.
118

    
119

    
120
Preprocessing Illumina Fastq Reads for Each Sample
121
==================================================
122

    
123
This preprocessing step consists in the 4 main steps embed in the
124
*wf.py* and described bellow. As a preamble, this script computes
125
*samples* *samples_mnase* and *strains* that will be used along the 4
126
steps.
127

    
128
wf.samples = []
129

    
130
   List of samples where a sample is identify by an id (key: *id*) and
131
   a strain name (key *strain*).
132

    
133
wf.samples_mnase = []
134

    
135
   List of Mnase samples.
136

    
137
wf.strains = []
138

    
139
   List of reference strains.
140

    
141

    
142
Creating Bowtie Index from each Reference Genome
143
------------------------------------------------
144

    
145
For each strain, we need to create bowtie index. Bowtie index of a
146
strain is a tree view of the genemoe reference for this strain. It
147
will be used by bowtie to align reads. This step is performed by the
148
following part of the *wf.py* script:
149

    
150
     for strain in strains:
151
       per_strain_stats[strain] = create_bowtie_index(strain, 
152
         config["FASTA_REFERENCE_GENOME_FILES"][strain], config["INDEX_DIR"], 
153
         config["BOWTIE_BUILD_BIN"])
154

    
155
The following table sum up involved file sizes and process durations
156
concerning this step.
157

    
158
+--------+------------------------+------------------------+------------------+
159
| strain | fasta genome file size | bowtie index file size | process duration |
160
+========+========================+========================+==================+
161
| BY     | 12 Mo                  | 25 Mo                  | 11 s.            |
162
+--------+------------------------+------------------------+------------------+
163
| RM     | 12 Mo                  | 24 Mo                  | 9 s.             |
164
+--------+------------------------+------------------------+------------------+
165
| YJM    | 12 Mo                  | 25 Mo                  | 11 s.            |
166
+--------+------------------------+------------------------+------------------+
167

    
168

    
169
Aligning Reads to Reference Genome
170
----------------------------------
171

    
172
Next, we launch bowtie to align reads to the reference genome. It
173
produces a *.sam* file that we convert into a *.bed* file. Binaries
174
for *bowtie*, *samtools* and *bedtools* are wrapped using python
175
*subprocess* class. This step is performed by the followinw part of
176
the *wf.py* script:
177

    
178
     for sample in samples:
179
       per_sample_align_stats["sample_%s" % sample["id"]] = align_reads(sample, 
180
         config["ALIGN_DIR"], config["LOG_DIR"], config["INDEX_DIR"], 
181
         config["ILLUMINA_OUTPUTFILE_PREFIX"], config["BOWTIE2_BIN"], 
182
         config["SAMTOOLS_BIN"], config["BEDTOOLS_BIN"])
183

    
184

    
185
Convert Aligned Reads for TemplateFilter
186
----------------------------------------
187

    
188
TemplateFilter use particular input format for reads, so we convert
189
*.bed* file. TemplateFilter expect reads as following: *chr coord
190
strand #read* where:
191

    
192
* chr is the number of the chromosome;
193

    
194
* coord is the coordinate of the reads;
195

    
196
* strand is *F* for forward and *R* for reverse;
197

    
198
* #reads the number of reads for this position.
199

    
200
Each entry is *tab*-separated.
201

    
202
**WARNING** for reverse strand bowtie returns the position of left
203
first nucleotid when TemplateFilter is waiting for right one. So this
204
step takes it into account and add lenght of reads (in our case 50) to
205
reverse reads coordinate.
206

    
207
This step is performed by the followinw part of the *wf.py* script:
208

    
209
     for sample in samples:
210
       per_sample_convert_stats["sample_%s" % sample["id"]] = split_fr_4_TF(sample, 
211
         config["ALIGN_DIR"], config["FASTA_INDEXES"], config["AREA_BLACK_LIST"], 
212
         config["READ_LENGTH"],config["MAPQ_THRES"])
213

    
214
The following table sum up number of reads, involved file sizes and
215
process durations concerning the two last steps. In our case, aligment
216
process have been multuthreaded over over 3 cores.
217

    
218
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
219
| id | Illumina reads | aligned and filtred reads | ratio  | *.bed* file size | TF input file size | process duration |
220
+====+================+===========================+========+==================+====================+==================+
221
| 1  | 16436138       | 10199695                  | 62,06% | 1064 Mo          | 60  Mo             | 383   s.         |
222
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
223
| 2  | 16911132       | 12512727                  | 73,99% | 1298 Mo          | 64  Mo             | 437   s.         |
224
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
225
| 3  | 15946902       | 12340426                  | 77,38% | 1280 Mo          | 65  Mo             | 423   s.         |
226
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
227
| 4  | 13765584       | 10381903                  | 75,42% | 931  Mo          | 59  Mo             | 352   s.         |
228
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
229
| 5  | 15168268       | 11502855                  | 75,83% | 1031 Mo          | 64  Mo             | 386   s.         |
230
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
231
| 6  | 18850820       | 14024905                  | 74,40% | 1254 Mo          | 69  Mo             | 482   s.         |
232
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
233
| 7  | 15591124       | 12126623                  | 77,78% | 1163 Mo          | 72  Mo             | 405   s.         |
234
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
235
| 8  | 15659905       | 12475664                  | 79,67% | 1194 Mo          | 71  Mo             | 416   s.         |
236
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
237
| 9  | 14668641       | 10960565                  | 74,72% | 1052 Mo          | 70  Mo             | 375   s.         |
238
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
239
| 10 | 14339179       | 10454451                  | 72,91% | 1049 Mo          | 51  Mo             | 363   s.         |
240
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
241
| 11 | 18019895       | 13688774                  | 75,96% | 1378 Mo          | 59  Mo             | 474   s.         |
242
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
243
| 12 | 13746796       | 10810022                  | 78,64% | 1084 Mo          | 54  Mo             | 360   s.         |
244
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
245
| 13 | 15205065       | 11766016                  | 77,38% | 990  Mo          | 54  Mo             | 381   s.         |
246
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
247
| 14 | 17803097       | 13838883                  | 77,73% | 1154 Mo          | 60  Mo             | 452   s.         |
248
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
249
| 15 | 15434564       | 12307878                  | 79,74% | 1032 Mo          | 57  Mo             | 394   s.         |
250
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
251
| 16 | 16802587       | 12725665                  | 75,74% | 1221 Mo          | 48  Mo             | 438   s.         |
252
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
253
| 17 | 16058417       | 12513734                  | 77,93% | 1192 Mo          | 63  Mo             | 422   s.         |
254
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
255
| 18 | 16154482       | 13204331                  | 81,74% | 1277 Mo          | 52  Mo             | 430   s.         |
256
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
257
| 19 | 21013924       | 17102120                  | 81,38% | 1646 Mo          | 59  Mo             | 555   s.         |
258
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
259
| 20 | 17213114       | 14433357                  | 83,85% | 1389 Mo          | 53  Mo             | 459   s.         |
260
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
261
| 21 | 17360907       | 14733001                  | 84,86% | 1203 Mo          | 55  Mo             | 450   s.         |
262
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
263
| 22 | 18136816       | 15389581                  | 84,85% | 1257 Mo          | 53  Mo             | 469   s.         |
264
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
265
| 23 | 14763678       | 12173025                  | 82,45% | 1140 Mo          | 56  Mo             | 393   s.         |
266
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
267
| 24 | 15541709       | 12890345                  | 82,94% | 1057 Mo          | 48  Mo             | 398   s.         |
268
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
269
| 25 | 16433215       | 13094314                  | 79,68% | 1241 Mo          | 57  Mo             | 433   s.         |
270
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
271
| 26 | 17370850       | 14264136                  | 82,12% | 1347 Mo          | 51  Mo             | 466   s.         |
272
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
273
| 27 | 14613512       | 8654495                   | 59,22% | 887  Mo          | 56  Mo             | 339   s.         |
274
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
275
| 28 | 15248545       | 11367589                  | 74,55% | 1166 Mo          | 67  Mo             | 405   s.         |
276
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
277
| 29 | 14316809       | 10767926                  | 75,21% | 1103 Mo          | 63  Mo             | 379   s.         |
278
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
279
| 30 | 15178058       | 12265794                  | 80,81% | 1030 Mo          | 66  Mo             | 390   s.         |
280
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
281
| 31 | 14968579       | 11876186                  | 79,34% | 1009 Mo          | 63  Mo             | 387   s.         |
282
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
283
| 32 | 16912705       | 13550508                  | 80,12% | 1143 Mo          | 70  Mo             | 442   s.         |
284
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
285
| 33 | 16782154       | 12755111                  | 76,00% | 1227 Mo          | 65  Mo             | 438   s.         |
286
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
287
| 34 | 16741443       | 13168071                  | 78,66% | 1260 Mo          | 71  Mo             | 442   s.         |
288
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
289
| 35 | 13096171       | 10367041                  | 79,16% | 992  Mo          | 62  Mo             | 350   s.         |
290
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
291
| 36 | 17715118       | 14092985                  | 79,55% | 1404 Mo          | 68  Mo             | 483   s.         |
292
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
293
| 37 | 17288466       | 7402082                   | 42,82% | 741  Mo          | 48  Mo             | 339   s.         |
294
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
295
| 38 | 16116394       | 13178457                  | 81,77% | 1101 Mo          | 63  Mo             | 420   s.         |
296
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
297
| 39 | 14241106       | 10537228                  | 73,99% | 880  Mo          | 57  Mo             | 348   s.         |
298
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
299
| 40 | 13784738       | 10598464                  | 76,89% | 1005 Mo          | 64  Mo             | 358   s.         |
300
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
301
| 41 | 12438007       | 9620975                   | 77,35% | 911  Mo          | 60  Mo             | 326   s.         |
302
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
303
| 42 | 13853959       | 11031238                  | 79,63% | 1045 Mo          | 64  Mo             | 365   s.         |
304
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
305
| 43 | 12036162       | 6654780                   | 55,29% | 684  Mo          | 46  Mo             | 268   s.         |
306
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
307
| 44 | 13873129       | 10251074                  | 73,89% | 1048 Mo          | 61  Mo             | 365   s.         |
308
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
309
| 45 | 19817751       | 14904502                  | 75,21% | 1520 Mo          | 72  Mo             | 528   s.         |
310
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
311
| 46 | 13368959       | 10818619                  | 80,92% | 912  Mo          | 63  Mo             | 350   s.         |
312
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
313
| 47 | 7566467        | 6139001                   | 81,13% | 520  Mo          | 44  Mo             | 201   s.         |
314
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
315
| 48 | 32586928       | 21191363                  | 65,03% | 1816 Mo          | 82  Mo             | 766   s.         |
316
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
317
| 49 | 30733184       | 18791373                  | 61,14% | 1801 Mo          | 89  Mo             | 721   s.         |
318
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
319
| 50 | 41287616       | 30383875                  | 73,59% | 2911 Mo          | 112 Mo             | 1065  s.         |
320
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
321
| 51 | 40439965       | 31177914                  | 77,10% | 2981 Mo          | 117 Mo             | 1070  s.         |
322
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
323
| 53 | 40876476       | 33780065                  | 82,64% | 3316 Mo          | 103 Mo             | 1165  s.         |
324
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
325
| 55 | 52424414       | 47117107                  | 89,88% | 3811 Mo          | 119 Mo             | 1477  s.         |
326
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
327

    
328
For some reasons (manipulation efficency, e.g. PCR...), we remove
329
samples 33, 45, 48 and 55.
330

    
331

    
332
Run TemplateFilter on Mnase Samples
333
-----------------------------------
334

    
335
Finally, for each sample we perfome TemplateFilter analysis.
336

    
337
**WARNING** TemplateFilter returns a list of nucleosomes. Each
338
nucleosome is define by its center and its width. An odd width leads
339
us to considere non interger lower and upper bound.
340

    
341
**WARNING** TemplateFilter is not design to deal with replicate. So we
342
choose to keep a maximum of nucleosome and filter in a second time
343
using the benefit of replicate. To do that we set a low correlation
344
threshold parameter (*0.5*) and a particularly high value of
345
overlaping (*300%*).
346

    
347
This step is performed by the followinw part of the *wf.py* script:
348

    
349
     for sample in samples_mnase:
350
       per_mnase_sample_stats["sample_%s" % sample["id"]] = template_filter(sample, 
351
         config["ALIGN_DIR"], config["LOG_DIR"], config["TF_BIN"], 
352
         config["TF_TEMPLATES_FILE"], config["TF_CORR"], config["TF_MINW"], 
353
         config["TF_MAXW"], config["TF_OL"])  
354

    
355
+----+--------+------------+---------------+------------------+
356
| id | strain | found nucs | nuc file size | process duration |
357
+====+========+============+===============+==================+
358
| 1  | BY     | 96214      | 68 Mo         | 1022 s.          |
359
+----+--------+------------+---------------+------------------+
360
| 2  | BY     | 91694      | 65 Mo         | 1038 s.          |
361
+----+--------+------------+---------------+------------------+
362
| 3  | BY     | 91205      | 65 Mo         | 1036 s.          |
363
+----+--------+------------+---------------+------------------+
364
| 4  | RM     | 88076      | 62 Mo         | 984 s.           |
365
+----+--------+------------+---------------+------------------+
366
| 5  | RM     | 90141      | 64 Mo         | 967 s.           |
367
+----+--------+------------+---------------+------------------+
368
| 6  | RM     | 87517      | 62 Mo         | 980 s.           |
369
+----+--------+------------+---------------+------------------+
370
| 7  | YJM    | 88945      | 64 Mo         | 566 s.           |
371
+----+--------+------------+---------------+------------------+
372
| 8  | YJM    | 88689      | 64 Mo         | 570 s.           |
373
+----+--------+------------+---------------+------------------+
374
| 9  | YJM    | 88128      | 63 Mo         | 565 s.           |
375
+----+--------+------------+---------------+------------------+
376

    
377

    
378
Inferring Nucleosome Position and Extracting Read Counts
379
========================================================
380

    
381
This preprocessing step consists in the 4 main steps embed in the
382
*wf.py* and described bellow. As a preamble, this script computes
383
*samples* *samples_mnase* and *strains* that will be used along the 4
384
steps.
385

    
386
The second part of the tutoriel use *R*
387
(http://http://www.r-project.org). It consists in the following main
388
steps:
389

    
390
   * compute_rois.R
391

    
392
   * extract_maps.R
393

    
394
   * compare_common_wp.R
395

    
396
   * split_samples.R
397

    
398
   * count_reads.R
399

    
400
   * get_size_factors
401

    
402
   * launch_deseq.R
403

    
404

    
405
Computing Common Genome Region Between Strains
406
----------------------------------------------
407

    
408
   R CMD BATCH src/current/compute_rois.R
409

    
410

    
411
Extracting Maps for Well Positionned and Fuzzy Nucleosomes
412
----------------------------------------------------------
413

    
414
   R CMD BATCH src/current/extract_maps.R
415

    
416

    
417
Compute Distance Between Well Positionned Nucleosomes
418
-----------------------------------------------------
419

    
420
   R CMD BATCH src/current/compare_common_wp.R
421

    
422

    
423
Split and Compress Samples According CURs
424
-----------------------------------------
425

    
426
   R CMD BATCH src/current/split_samples.R
427

    
428

    
429
Count Reads for Each Nucleosome
430
-------------------------------
431

    
432
   R CMD BATCH src/current/count_reads.R
433

    
434

    
435
Get Size Factors Using DESeq
436
----------------------------
437

    
438
   R CMD BATCH src/current/get_size_factors.R
439

    
440

    
441
Performing DESeq Analysis
442
-------------------------
443

    
444
   R CMD BATCH src/current/launch_deseq.R
445

    
446

    
447
Results
448
=======
449

    
450

    
451
Output Files Organisation
452
-------------------------
453

    
454
Previous steps produce following 45 files. Each filename is under the
455
form
456

    
457
   results/current/[combi]_[marker]_[form]_snep.tab
458

    
459
Where combi is in {BY_RM, BY_YJM, RM_YJM} for each strain combination,
460
marker is in {H3K4me1, H3K4me3, H3K9ac, H3K14ac, H4K12ac} for each
461
post translational histone modification and form is in {wp, fuzzy,
462
wpfuzzy} considering well positionned nucleosomes, fuzzy nucleosomes
463
or both for SNEP computation.
464

    
465
chr_BY lower_bound_BY upper_bound_BY index_nuc_BY chr_RM
466
lower_bound_RM upper_bound_RM index_nuc_RM roi_index form
467
BY_Mnase_Seq_1 BY_Mnase_Seq_2 BY_Mnase_Seq_3 RM_Mnase_Seq_4
468
RM_Mnase_Seq_5 RM_Mnase_Seq_6 BY_H3K14ac_36 BY_H3K14ac_37
469
BY_H3K14ac_53 RM_H3K14ac_38 RM_H3K14ac_39 pvalsGLM
470

    
471
For each file, there is 1 line per nucleosome and each line is
472
composed of many columns divided into 3 main topics:
473
   * nuc information
474

    
475
   * number opf reads for each sample
476

    
477
   * DESeq analysis results.
478

    
479
For exemple for the file *BY_RM_H3K14ac_wp_snep.tab* informations are:
480
   * chr_BY, the BY chr involved
481

    
482
   * lower_bound_BY, the lower bound of the BY nuc
483

    
484
   * upper_bound_BY, the upper_bound of the BY nuc
485

    
486
   * index_nuc_BY, the index of the nuc in the entire list of BY
487
     nucs
488

    
489
   * chr_RM, lower_bound_RM, upper_bound_RM, index_nuc_RM
490

    
491
        are the same information for the RM strain
492

    
493
   * roi_index, the index of the region of interrest involved.
494

    
495
Next cols concern indicators for each sample. They are labeled
496
[strain]_[marker]_[sample_id] and each value represents the number of
497
reads for the current nuc for the sample *sample_id*.
498

    
499
The 5 final columns concern DESeq analysis:
500
   * manip[a_manip] strain[a_strain]
501
     manip[a_strain]:strain[a_strain], the manip (marker) effect, the
502
     strain effect and the snep effect.
503

    
504
   * pvalsGLM, the pvalue resulting of the comparison of the GLM
505
     model considering or the interaction term *marker:strain*
506

    
507
   * snep_index, a boolean set to TRUE if the *pvalueGLM* value is
508
     under the threshold computed with FDR function with a rate set to
509
     0.01%.
510

    
511
It also produces the file that explicts size factor for each involved
512
sample in differents strain combination and nucleosomal region type:
513

    
514
TODO: include this file...
515
/home/filleton/analyses/snepcatalog/data/2013-10-09/current/README.txt
516

    
517
   results/current/size_factors.tab
518

    
519

    
520
Number of SNEPs
521
---------------
522

    
523
Here are the number of computed for each forms.
524

    
525
   [1] "wp"
526
          #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
527
   BY-RM  30234     520     798     83    3566      26
528
   BY-YJM 31298     303     619    102     103     128
529
   RM-YJM 29863     129     340     46    3177      18
530
   [1] "fuzzy"
531
          #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
532
   BY-RM  10748     294     308    101    1681      42
533
   BY-YJM 10669     122     176    124      93      87
534
   RM-YJM 11478      54     112     41    1389      20
535
   [1] "wpfuzzy"
536
          #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
537
   BY-RM  40982     770    1136    183    5404      73
538
   BY-YJM 41967     439     804    214     198     199
539
   RM-YJM 41341     184     468     87    4687      37
540

    
541
TODO:
542
   * Print/study intra/inter strain LODs.
543

    
544
   * Check the normality of sample using Shapiro–Wilk (Hypothesis
545
     for computing LODs)