Statistiques
| Branche: | Révision :

root / doc / sphinx_doc / tuto.rst @ dadb6a4d

Historique | Voir | Annoter | Télécharger (23,21 ko)

1
Tutorial
2
========
3

    
4
This tutorial describes steps allowing to perform quantitave analysis of 
5
nucleosomal epigenome. We assume that files are organised around a given 
6
hierarchie and that all command lines are launched from project's root.
7

    
8
This tutorial is divided into t=wo main parts. First one consists in the python 
9
script `wf.py` that aligns and convert Illumina reads. Second one is the R 
10
script `main.r` that extracts information (nucleosome position and indicators) 
11
from the dataset.
12

    
13

    
14
Python and R Common Configuration File
15
--------------------------------------
16

    
17
First of all we define in one place some configuration variables that will be launch by python and R scripts. This file is **configurator.py**. The execution of this python script dumps variables into the **nucleo_miner_config.json** that will be launch by both kind of scriopts (R and puython).
18

    
19
To do this launch at the root of your project the following command line:
20

    
21
.. code:: bash
22

    
23
  python src/current/configurator.py
24
  
25

    
26
$$$ other python script to describe:
27
- libcoverage.py
28
- wf.py
29

    
30

    
31

    
32

    
33

    
34
Dataset and Configuration Variables
35
-----------------------------------
36

    
37
We want to compare nucleosomes of 3 yeast strains: 
38

    
39
- BY
40
- RM
41
- YJM
42

    
43
For each strain we perform Mnase-Seq and ChIP-Seq using the 5 following 
44
markers: 
45

    
46
- H3K4me1
47
- H3K4me3
48
- H3K9ac
49
- H3K14ac
50
- H4K12ac
51

    
52
In order to simplify the design of experiment, we considere Mnase as a marker. 
53
For each couple `(strain, marker)` we perform 3 replicates. So, theoritically 
54
we should have `3 * (1 + 5) * 3 = 54` samples. In practice we only obtain 2 
55
replicates for `(YJM, H3K4me1)`. Each one of the 53 samples is indentify by a 
56
uniq identifier. The file `CSV_SAMPLE_FILE` sums up this information.
57

    
58
.. autodata:: configurator.CSV_SAMPLE_FILE
59
    :noindex: 
60
		
61
We use a convention to link sample and Illumina fastq outputs. Illumina output 
62
files of the sample `ID` will be stored in the directory 
63
`ILLUMINA_OUTPUTFILE_PREFIX` + `ID`. For example, sample 41 outputs will be 
64
stored in the directory `data/2012-09-05/FASTQ/Sample_Yvert_Bq41/`.
65

    
66
.. autodata:: configurator.ILLUMINA_OUTPUTFILE_PREFIX
67
    :noindex: 
68

    
69
For BY (resp. RM and YJM) we use following reference genome 
70
`saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta`
71
(resp. `saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta` and 
72
`saccharomyces_cerevisiae_YJM_789_screencontig.fasta`).
73
The index `FASTA_REFERENCE_GENOME_FILES` stores this information.
74

    
75
.. autodata:: configurator.FASTA_REFERENCE_GENOME_FILES
76
    :noindex: 
77

    
78
Each chromosome/contig is identify in the fasta file by an obscure identifier. 
79
For example, BY chromosome I is identify by `gi|144228165|ref|NC_001133.7|` when 
80
TemplateFilter is waiting for an integer. So, we translate it. The index 
81
`FASTA_INDEXES` stores this translation.
82

    
83
.. autodata:: configurator.FASTA_INDEXES
84
    :noindex: 
85

    
86
From a pragamatical point of view we discard some part of the genome (repeated 
87
sequence etc...). The list of the black listed area is explicitely detailled in 
88
`AREA_BLACK_LIST`.
89

    
90
.. autodata:: configurator.AREA_BLACK_LIST
91
    :noindex: 
92

    
93
For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use previously 
94
compute .c2c file `data/2012-03_primarydata/BY_RM_gxcomp.c2c` (resp. 
95
`BY_YJM_GComp_All.c2c` and `RM_YJM_gxcomp.c2c`). For more information about 
96
.c2c files, please read section 5 of the manual of `NucleoMiner`, the old 
97
version of `NucleoMiner2` 
98
(http://www.ens-lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf).
99

    
100
.. autodata:: configurator.C2C_FILES
101
    :noindex: 
102

    
103
`nucleominer` uses specific directory to work in, these are described in 
104
`INDEX_DIR`, `ALIGN_DIR` and `LOG_DIR`.
105

    
106
Finally, `nucleominer` use external ressources, the path to these resspources 
107
are describe in `BOWTIE_BUILD_BIN`, `BOWTIE2_BIN`, `SAMTOOLS_BIN`, 
108
`BEDTOOLS_BIN` and `TF_BIN` and `TF_TEMPLATES_FILE`.
109

    
110
All paths, prefixes and indexes could be change in the 
111
`src/current/nucleominer_config.json` file.
112

    
113
.. autodata:: wf.json_conf_file
114
    :noindex: 
115

    
116

    
117
Preprocessing Illumina Fastq Reads for Each Sample
118
--------------------------------------------------
119

    
120
This preprocessing step consists in the 4 main steps embed in the `wf.py` and 
121
described bellow. As a preamble, this script computes `samples` `samples_mnase` 
122
and `strains` that will be used along the 4 steps.
123

    
124
.. autodata:: wf.samples
125
    :noindex: 
126

    
127
.. autodata:: wf.samples_mnase
128
    :noindex: 
129

    
130
.. autodata:: wf.strains
131
    :noindex: 
132

    
133

    
134
Creating Bowtie Index from each Reference Genome
135
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
136

    
137
For each strain, we need to create bowtie index. Bowtie index of a strain is a 
138
tree view of the genemoe reference for this strain. It will be used by 
139
bowtie to align reads. This step is performed by the following part of the 
140
`wf.py` script:
141

    
142
.. literalinclude:: ../../../snep/src/current/wf.py
143
   :start-after: # _STARTOF_ step_1
144
   :end-before: # _ENDOF_ step_1
145
   :language: python
146

    
147
The following table sum up involved file sizes and process durations concerning 
148
this step.
149

    
150
======  ======================  ======================  ================
151
strain  fasta genome file size  bowtie index file size  process duration
152
======  ======================  ======================  ================
153
BY      12 Mo                          25 Mo                    11 s.
154
RM      12 Mo                          24 Mo                    9 s.
155
YJM     12 Mo                          25 Mo                    11 s.
156
======  ======================  ======================  ================
157

    
158

    
159

    
160

    
161
Aligning Reads to Reference Genome 
162
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
163

    
164
Next, we launch bowtie to align reads to the reference genome. It produces a 
165
`.sam` file that we convert into a `.bed` file. Binaries for `bowtie`, `samtools` 
166
and `bedtools` are wrapped using python `subprocess` class. This step is 
167
performed by the followinw part of the `wf.py` script:
168

    
169
.. literalinclude:: ../../../snep/src/current/wf.py
170
   :start-after: # _STARTOF_ step_2
171
   :end-before: # _ENDOF_ step_2
172
   :language: python
173

    
174
Convert Aligned Reads for TemplateFilter
175
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
176
TemplateFilter use particular input format for reads, so we convert `.bed` file. 
177
TemplateFilter expect reads as following: `chr coord strand #read` where:
178

    
179
- chr is the number of the chromosome;
180
- coord is the coordinate of the reads;
181
- strand is `F` for forward and `R` for reverse;
182
- #reads the number of reads for this position.
183

    
184
Each entry is *tab*-separated.
185

    
186
**WARNING** for reverse strand bowtie returns the position of left first 
187
nucleotid when TemplateFilter is waiting for right one. So this step takes it 
188
into account and add lenght of reads (in our case 50) to reverse reads 
189
coordinate.
190

    
191
This step is performed by the followinw part of the `wf.py` script:
192

    
193
.. literalinclude:: ../../../snep/src/current/wf.py
194
   :start-after: # _STARTOF_ step_3
195
   :end-before: # _ENDOF_ step_3
196
   :language: python
197

    
198
The following table sum up number of reads, involved file sizes and process durations concerning 
199
the two last steps. In our case, aligment process have been multuthreaded over over 3 cores.
200

    
201
==  ==============  =========================  ======  ================  ==================  ================  
202
id  Illumina reads  aligned and filtred reads  ratio   `.bed` file size  TF input file size  process duration
203
==  ==============  =========================  ======  ================  ==================  ================
204
1   16436138        10199695                   62,06%  1064 Mo           60  Mo              383   s.
205
2   16911132        12512727                   73,99%  1298 Mo           64  Mo              437   s.
206
3   15946902        12340426                   77,38%  1280 Mo           65  Mo              423   s.
207
4   13765584        10381903                   75,42%  931  Mo           59  Mo              352   s.
208
5   15168268        11502855                   75,83%  1031 Mo           64  Mo              386   s.
209
6   18850820        14024905                   74,40%  1254 Mo           69  Mo              482   s.
210
7   15591124        12126623                   77,78%  1163 Mo           72  Mo              405   s.
211
8   15659905        12475664                   79,67%  1194 Mo           71  Mo              416   s.
212
9   14668641        10960565                   74,72%  1052 Mo           70  Mo              375   s.
213
10  14339179        10454451                   72,91%  1049 Mo           51  Mo              363   s.
214
11  18019895        13688774                   75,96%  1378 Mo           59  Mo              474   s.
215
12  13746796        10810022                   78,64%  1084 Mo           54  Mo              360   s.
216
13  15205065        11766016                   77,38%  990  Mo           54  Mo              381   s.
217
14  17803097        13838883                   77,73%  1154 Mo           60  Mo              452   s.
218
15  15434564        12307878                   79,74%  1032 Mo           57  Mo              394   s.
219
16  16802587        12725665                   75,74%  1221 Mo           48  Mo              438   s.
220
17  16058417        12513734                   77,93%  1192 Mo           63  Mo              422   s.
221
18  16154482        13204331                   81,74%  1277 Mo           52  Mo              430   s.
222
19  21013924        17102120                   81,38%  1646 Mo           59  Mo              555   s.
223
20  17213114        14433357                   83,85%  1389 Mo           53  Mo              459   s.
224
21  17360907        14733001                   84,86%  1203 Mo           55  Mo              450   s.
225
22  18136816        15389581                   84,85%  1257 Mo           53  Mo              469   s.
226
23  14763678        12173025                   82,45%  1140 Mo           56  Mo              393   s.
227
24  15541709        12890345                   82,94%  1057 Mo           48  Mo              398   s.
228
25  16433215        13094314                   79,68%  1241 Mo           57  Mo              433   s.
229
26  17370850        14264136                   82,12%  1347 Mo           51  Mo              466   s.
230
27  14613512        8654495                    59,22%  887  Mo           56  Mo              339   s.
231
28  15248545        11367589                   74,55%  1166 Mo           67  Mo              405   s.
232
29  14316809        10767926                   75,21%  1103 Mo           63  Mo              379   s.
233
30  15178058        12265794                   80,81%  1030 Mo           66  Mo              390   s.
234
31  14968579        11876186                   79,34%  1009 Mo           63  Mo              387   s.
235
32  16912705        13550508                   80,12%  1143 Mo           70  Mo              442   s.
236
33  16782154        12755111                   76,00%  1227 Mo           65  Mo              438   s.
237
34  16741443        13168071                   78,66%  1260 Mo           71  Mo              442   s.
238
35  13096171        10367041                   79,16%  992  Mo           62  Mo              350   s.
239
36  17715118        14092985                   79,55%  1404 Mo           68  Mo              483   s.
240
37  17288466        7402082                    42,82%  741  Mo           48  Mo              339   s.
241
38  16116394        13178457                   81,77%  1101 Mo           63  Mo              420   s.
242
39  14241106        10537228                   73,99%  880  Mo           57  Mo              348   s.
243
40  13784738        10598464                   76,89%  1005 Mo           64  Mo              358   s.
244
41  12438007        9620975                    77,35%  911  Mo           60  Mo              326   s.
245
42  13853959        11031238                   79,63%  1045 Mo           64  Mo              365   s.
246
43  12036162        6654780                    55,29%  684  Mo           46  Mo              268   s.
247
44  13873129        10251074                   73,89%  1048 Mo           61  Mo              365   s.
248
45  19817751        14904502                   75,21%  1520 Mo           72  Mo              528   s.
249
46  13368959        10818619                   80,92%  912  Mo           63  Mo              350   s.
250
47  7566467         6139001                    81,13%  520  Mo           44  Mo              201   s.
251
48  32586928        21191363                   65,03%  1816 Mo           82  Mo              766   s.
252
49  30733184        18791373                   61,14%  1801 Mo           89  Mo              721   s.
253
50  41287616        30383875                   73,59%  2911 Mo           112 Mo              1065  s.
254
51  40439965        31177914                   77,10%  2981 Mo           117 Mo              1070  s.
255
53  40876476        33780065                   82,64%  3316 Mo           103 Mo              1165  s.
256
55  52424414        47117107                   89,88%  3811 Mo           119 Mo              1477  s.
257
==  ==============  =========================  ======  ================  ==================  ================  
258

    
259
For some reasons (manipulation efficiency, e.g. PCR...), we remove samples 33, 45, 48 and 55.
260

    
261

    
262
Run TemplateFilter on Mnase Samples
263
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
264

    
265
Finally, for each sample we perfome TemplateFilter analysis. 
266

    
267
**WARNING** TemplateFilter returns a list of nucleosomes. Each nucleosome is 
268
define by its center and its width. An odd width leads us to considere non 
269
interger lower and upper bound.
270

    
271
**WARNING** TemplateFilter is not design to deal with replicate. So we choose to 
272
keep a maximum of nucleosome and filter in a second time using the benefit of 
273
replicate. To do that we set a low correlation threshold parameter (`0.5`) and a 
274
particularly high value of overlaping (`300%`).
275

    
276
This step is performed by the followinw part of the `wf.py` script:
277

    
278
.. literalinclude:: ../../../snep/src/current/wf.py
279
   :start-after: # _STARTOF_ step_4
280
   :end-before: # _ENDOF_ step_4
281
   :language: python
282

    
283
==  ======  ==========  =============  ================
284
id  strain  found nucs  nuc file size  process duration
285
==  ======  ==========  =============  ================
286
1    BY     96214       68 Mo          1022 s.                     
287
2    BY     91694       65 Mo          1038 s.                      
288
3    BY     91205       65 Mo          1036 s.                       
289
4    RM     88076       62 Mo          984 s.                      
290
5    RM     90141       64 Mo          967 s.                      
291
6    RM     87517       62 Mo          980 s.                      
292
7    YJM    88945       64 Mo          566 s.                      
293
8    YJM    88689       64 Mo          570 s.                      
294
9    YJM    88128       63 Mo          565 s.                    
295
==  ======  ==========  =============  ================
296

    
297

    
298

    
299

    
300

    
301

    
302

    
303

    
304

    
305

    
306

    
307

    
308

    
309
Inferring Nucleosome Position and Extracting Read Counts
310
--------------------------------------------------------
311

    
312

    
313

    
314
The second part of the tutorial uses `R` (http://http://www.r-project.org). It consists in a set of R scripts taht will be sourced in an R console launched at the root of your project. the R srcipts are:
315

    
316
  - headers.R
317
  - extract_maps.R
318
  - compare_common_wp.R
319
  - split_samples.R
320
  - count_reads.R
321
  - get_size_factors  
322
  - launch_deseq.R
323

    
324
The Script headers.R
325
^^^^^^^^^^^^^^^^^^^^
326

    
327
The script header.R is included in each other scripts. It is in charge of: 
328

    
329
  - launching libraries used in thes scripts
330
  - launching configuration (design, strain, marker...)
331
  - computing and caching CURs
332

    
333
In your R console, run the following command line:
334

    
335
.. code:: bash
336

    
337
  R CMD BATCH src/current/header.R
338

    
339

    
340
The Script extract_maps.R
341
^^^^^^^^^^^^^^^^^^^^^^^^^
342
This script is in charge of extracting Maps for well positioned and fuzzy nucleosomes. First of all, this script computed intra and inter strain nucleosome maps for each CUR. This step is executed in parallel on many cores using the BoT library. Next, it collects results and produces well positioned, fuzzy and UNR maps.
343

    
344
The well-positioned map for BY is collected in the result directory and is called **BY_wp.tab**. It is composed of following columns:
345

    
346
 - chr, the number of the chromosome 
347
 - lower_bound, the lower bound of the nucleosome
348
 - upper_bound, the upper bound of the nucleosome 
349
 - cur_index, index of the CUR
350
 - index_nuc, the index of the nucleosome in the CUR
351
 - wp, 1 if it is a well positioned nucleosome, 0 else
352
 - nb_reads, the number of reads that supports this nucleosome
353
 - nb_nucs, the number of TemplateFilter nucleosome across replicates (= the number of replicates if it is a well-positioned nucleosome)
354
 - llr_1, for a well-positioned nucleosome, it is the LLR1 between the first and the second TemplateFilter nucleosome.
355
 - llr_2, for a well-positioned nucleosome, it is the LLR1 between the second and the first TemplateFilter nucleosome.
356
 - wp_llr, for a well-positioned nucleosome, it is the LLR2 overall TemplateFilter nucleosomes.
357
 - wp_pval, for a well-positioned nucleosome, it is the p-value chi square test obtained with the LLR2 (**1-pchisq(2.LLR2, df=4)**)
358
 - dyad_shift, for a well-positioned nucleosome, it is shift between the two extreme TemplateFilter nucleosome dyad positions. 
359

    
360
The fuzzy map for BY is collected in the result directory and is called **BY_fuzzy.tab**. It is composed of following columns:
361

    
362
 - chr, the number of the chromosome 
363
 - lower_bound, the lower bound of the nucleosome
364
 - upper_bound, the upper bound of the nucleosome 
365
 - cur_index, index of the CUR
366

    
367
The common well-position map for BY and RM strains is collected in the result directory and is called **BY_RM_common_wp.tab**. It is composed of following columns:
368

    
369
 - cur_index, the index of the CUR
370
 - index_nuc_BY, the index of the BY nucleosome in the CUR
371
 - index_nuc_RM,the index of the RM nucleosome in the CUR
372
 - llr_score, the LLR3 score between th eBy and RM nucleosomes
373
 - common_wp_pval,  the p-value chi square test obtained with the LLR3 (**1-pchisq(2.LLR3, df=2)**)
374

    
375
The common UNR map for BY and RM strains is collected in the result directory and is called **BY_RM_common_unr.tab**. It is composed of following columns:
376

    
377
 - cur_index, the index of the CUR
378
 - index_nuc_BY, the index of the BY nucleosome in the CUR
379
 - index_nuc_RM,the index of the RM nucleosome in the CUR
380

    
381
To execute this script, run the following command line in your R console:
382

    
383
.. code:: bash
384

    
385
  source("src/current/extract_maps.R")
386

    
387

    
388
The Script compare_common_wp.R
389
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
390

    
391
This script is used to compare inter strain distances between common well-positioned nucleosomes. 
392

    
393
For example, it compute the file **BY_RM_common_wp_diff.tab** that contains dyad shifts between two well-positioned nucleosomes. It is composed of following columns:
394
 - cur_index, the index of the CUR
395
 - index_nuc_BY, the index of the BY nucleosome in the CUR
396
 - index_nuc_RM,the index of the RM nucleosome in the CUR
397
 - llr_score, the LLR3 score between th eBy and RM nucleosomes
398
 - common_wp_pval,  the p-value chi square test obtained with the LLR3 (**1-pchisq(2.LLR3, df=2)**)
399
 - diff, the dyad shifts between two well-positioned nucleosomes
400

    
401
It also translates well-positioned nucleosome maps from a strain to an other strain and stores it into a table. 
402

    
403
For example, the file **results/2014-04/RM_wp_tr_2_BY.tab** contains RM well-positioned nucleosome translated into the BY genome referential. It is composed of following columns:
404

    
405
 - strain_ref, the reference genome (in which positioned are defined)
406
 - begin, the translated lower bound of the nucleosome
407
 - end, the translated upper bound of the nucleosome
408
 - chr, the number of chromosome for the reference genome (in which positioned are defined)
409
 - length, the length of the nucleosome (could be negative)
410
 - cur_index, the index of the CUR
411
 - index_nuc, the index of the nucleosome in the CUR
412

    
413

    
414

    
415
To execute this script, run the following command line in your R console:
416

    
417
.. code:: bash
418

    
419
  R CMD BATCH src/current/compare_common_wp.R
420

    
421

    
422
Split and Compress Samples According CURs
423
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
424

    
425
.. code:: bash
426

    
427
  R CMD BATCH src/current/split_samples.R
428

    
429

    
430
Count Reads for Each Nucleosome
431
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
432

    
433
.. code:: bash
434

    
435
  R CMD BATCH src/current/count_reads.R
436

    
437

    
438
Get Size Factors Using DESeq
439
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
440

    
441
.. code:: bash
442

    
443
  R CMD BATCH src/current/get_size_factors.R
444

    
445

    
446
Performing DESeq Analysis
447
^^^^^^^^^^^^^^^^^^^^^^^^^
448

    
449
.. code:: bash
450

    
451
  R CMD BATCH src/current/launch_deseq.R
452

    
453

    
454
Results
455
-------
456

    
457
Output Files Organisation
458
^^^^^^^^^^^^^^^^^^^^^^^^^
459
Previous steps produce following 45 files. 
460
Each filename is under the form 
461

    
462
.. code:: bash
463

    
464
  results/current/[combi]_[marker]_[form]_snep.tab 
465

    
466
Where combi is in {BY_RM, BY_YJM, RM_YJM} for each strain combination, marker is 
467
in {H3K4me1, H3K4me3, H3K9ac, H3K14ac, H4K12ac} for each post translational 
468
histone modification and form is in {wp, fuzzy, wpfuzzy} considering well 
469
positioned nucleosomes, fuzzy nucleosomes or both for SNEP computation.
470

    
471

    
472

    
473
chr_BY lower_bound_BY upper_bound_BY index_nuc_BY chr_RM lower_bound_RM upper_bound_RM index_nuc_RM roi_index form BY_Mnase_Seq_1 BY_Mnase_Seq_2 BY_Mnase_Seq_3 RM_Mnase_Seq_4 RM_Mnase_Seq_5 RM_Mnase_Seq_6 BY_H3K14ac_36 
474
BY_H3K14ac_37 BY_H3K14ac_53 RM_H3K14ac_38 RM_H3K14ac_39 pvalsGLM 
475

    
476
For each file, there is 1 line per nucleosome and each line is composed of many columns divided into 3 main topics:
477
  - nuc information
478
  - number opf reads for each sample
479
  - DESeq analysis results.
480

    
481
For exemple for the file *BY_RM_H3K14ac_wp_snep.tab* informations are: 
482
  - chr_BY, the BY chr involved
483
  - lower_bound_BY, the lower bound of the BY nuc
484
  - upper_bound_BY, the upper_bound of the BY nuc
485
  - index_nuc_BY, the index of the nuc in the entire list of BY nucs
486
  - chr_RM, lower_bound_RM, upper_bound_RM, index_nuc_RM 
487
	are the same information for the RM strain
488
  - roi_index, the index of the region of interrest involved.
489
  
490
Next cols concern indicators for each sample. They are labeled [strain]_[marker]_[sample_id] and each value represents the number of reads for the current nuc for the sample *sample_id*. 
491

    
492
The 5 final columns concern DESeq analysis:
493
  - manip[a_manip] strain[a_strain] manip[a_strain]:strain[a_strain], the manip (marker) effect, the strain effect and the snep effect.  
494
  - pvalsGLM, the pvalue resulting of the comparison of the GLM model considering or the interaction term *marker:strain* 
495
  - snep_index, a boolean set to TRUE if the *pvalueGLM* value is under the threshold computed with FDR function with a rate set to 0.01%. 
496

    
497
It also produces the file that explicts size factor for each involved sample in differents strain combination and nucleosomal region type:
498

    
499
TODO: include this file... /home/filleton/analyses/snepcatalog/data/2013-10-09/current/README.txt
500

    
501

    
502
.. code:: bash
503

    
504
  results/current/size_factors.tab
505

    
506

    
507

    
508

    
509
Number of SNEPs
510
^^^^^^^^^^^^^^^
511

    
512
Here are the number of computed for each forms.
513

    
514
.. code:: bash
515

    
516
  [1] "wp"
517
         #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
518
  BY-RM  30234     520     798     83    3566      26
519
  BY-YJM 31298     303     619    102     103     128
520
  RM-YJM 29863     129     340     46    3177      18
521
  [1] "fuzzy"
522
         #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
523
  BY-RM  10748     294     308    101    1681      42
524
  BY-YJM 10669     122     176    124      93      87
525
  RM-YJM 11478      54     112     41    1389      20
526
  [1] "wpfuzzy"
527
         #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
528
  BY-RM  40982     770    1136    183    5404      73
529
  BY-YJM 41967     439     804    214     198     199
530
  RM-YJM 41341     184     468     87    4687      37
531

    
532

    
533
TODO: 
534
  - Print/study intra/inter strain LODs.
535
  - Check the normality of sample using Shapiro–Wilk (Hypothesis for computing LODs)
536
  	
537