Statistiques
| Branche: | Révision :

root / doc / sphinx_doc / tuto.rst @ 1d833b97

Historique | Voir | Annoter | Télécharger (18,39 ko)

1 935a568c Florent Chuffart
Tutorial
2 935a568c Florent Chuffart
========
3 935a568c Florent Chuffart
4 935a568c Florent Chuffart
This tutorial describes steps allowing to perform quantitave analysis of 
5 935a568c Florent Chuffart
nucleosomal epigenome. We assume that files are organised around a given 
6 935a568c Florent Chuffart
hierarchie and that all command lines are launched from project's root.
7 935a568c Florent Chuffart
8 935a568c Florent Chuffart
This tutorial is divided into t=wo main parts. First one consists in the python 
9 935a568c Florent Chuffart
script `wf.py` that aligns and convert Illumina reads. Second one is the R 
10 935a568c Florent Chuffart
script `main.r` that extracts information (nucleosome position and indicators) 
11 935a568c Florent Chuffart
from the dataset.
12 935a568c Florent Chuffart
13 935a568c Florent Chuffart
14 935a568c Florent Chuffart
Dataset and Configuration File
15 935a568c Florent Chuffart
------------------------------
16 935a568c Florent Chuffart
17 935a568c Florent Chuffart
We want to compare nucleosomes of 3 yeast strains: 
18 935a568c Florent Chuffart
19 935a568c Florent Chuffart
- BY
20 935a568c Florent Chuffart
- RM
21 935a568c Florent Chuffart
- YJM
22 935a568c Florent Chuffart
23 935a568c Florent Chuffart
For each strain we perform Mnase-Seq and ChIP-Seq using the 5 following 
24 935a568c Florent Chuffart
markers: 
25 935a568c Florent Chuffart
26 935a568c Florent Chuffart
- H3K4me1
27 935a568c Florent Chuffart
- H3K4me3
28 935a568c Florent Chuffart
- H3K9ac
29 935a568c Florent Chuffart
- H3K14ac
30 935a568c Florent Chuffart
- H4K12ac
31 935a568c Florent Chuffart
32 935a568c Florent Chuffart
In order to simplify the design of exeriment, we considere Mnase as a marker. 
33 935a568c Florent Chuffart
For each couple `(strain, marker)` we perform 3 replicates. So, theoritically 
34 935a568c Florent Chuffart
we should have `3 * (1 + 5) * 3 = 54` samples. In practice we only obtain 2 
35 935a568c Florent Chuffart
replicates for `(YJM, H3K4me1)`. Each one of the 53 samples is indentify by a 
36 935a568c Florent Chuffart
uniq identifier. The file `CSV_SAMPLE_FILE` sums up this information.
37 935a568c Florent Chuffart
38 935a568c Florent Chuffart
.. autodata:: configurator.CSV_SAMPLE_FILE
39 935a568c Florent Chuffart
    :noindex: 
40 935a568c Florent Chuffart
		
41 935a568c Florent Chuffart
We use a convention to link sample and Illumina fastq outputs. Illumina output 
42 935a568c Florent Chuffart
files of the sample `ID` will be stored in the directory 
43 935a568c Florent Chuffart
`ILLUMINA_OUTPUTFILE_PREFIX` + `ID`. For example, sample 41 outputs will be 
44 935a568c Florent Chuffart
stored in the directory `data/2012-09-05/FASTQ/Sample_Yvert_Bq41/`.
45 935a568c Florent Chuffart
46 935a568c Florent Chuffart
.. autodata:: configurator.ILLUMINA_OUTPUTFILE_PREFIX
47 935a568c Florent Chuffart
    :noindex: 
48 935a568c Florent Chuffart
49 935a568c Florent Chuffart
For BY (resp. RM and YJM) we use following reference genome 
50 935a568c Florent Chuffart
`saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta`
51 935a568c Florent Chuffart
(resp. `saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta` and 
52 935a568c Florent Chuffart
`saccharomyces_cerevisiae_YJM_789_screencontig.fasta`).
53 935a568c Florent Chuffart
The index `FASTA_REFERENCE_GENOME_FILES` stores this information.
54 935a568c Florent Chuffart
55 935a568c Florent Chuffart
.. autodata:: configurator.FASTA_REFERENCE_GENOME_FILES
56 935a568c Florent Chuffart
    :noindex: 
57 935a568c Florent Chuffart
58 935a568c Florent Chuffart
Each chromosome/contig is identify in the fasta file by an obscure identifier. 
59 935a568c Florent Chuffart
For example, BY chromosome I is identify by `gi|144228165|ref|NC_001133.7|` when 
60 935a568c Florent Chuffart
TemplateFilter is waiting for an integer. So, we translate it. The index 
61 935a568c Florent Chuffart
`FASTA_INDEXES` stores this translation.
62 935a568c Florent Chuffart
63 935a568c Florent Chuffart
.. autodata:: configurator.FASTA_INDEXES
64 935a568c Florent Chuffart
    :noindex: 
65 935a568c Florent Chuffart
66 935a568c Florent Chuffart
From a pragamatical point of view we discard some part of the genome (repeated 
67 935a568c Florent Chuffart
sequence etc...). The list of the black listed area is explicitely detailled in 
68 935a568c Florent Chuffart
`AREA_BLACK_LIST`.
69 935a568c Florent Chuffart
70 935a568c Florent Chuffart
.. autodata:: configurator.AREA_BLACK_LIST
71 935a568c Florent Chuffart
    :noindex: 
72 935a568c Florent Chuffart
73 935a568c Florent Chuffart
For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use previously 
74 935a568c Florent Chuffart
compute .c2c file `data/2012-03_primarydata/BY_RM_gxcomp.c2c` (resp. 
75 935a568c Florent Chuffart
`BY_YJM_GComp_All.c2c` and `RM_YJM_gxcomp.c2c`). For more information about 
76 935a568c Florent Chuffart
.c2c files, please read section 5 of the manual of `NucleoMiner`, the old 
77 935a568c Florent Chuffart
version of `NucleoMiner2` 
78 935a568c Florent Chuffart
(http://www.ens-lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf).
79 935a568c Florent Chuffart
80 935a568c Florent Chuffart
.. autodata:: configurator.C2C_FILES
81 935a568c Florent Chuffart
    :noindex: 
82 935a568c Florent Chuffart
83 935a568c Florent Chuffart
`nucleominer` uses specific directory to work in, these are described in 
84 935a568c Florent Chuffart
`INDEX_DIR`, `ALIGN_DIR` and `LOG_DIR`.
85 935a568c Florent Chuffart
86 935a568c Florent Chuffart
Finally, `nucleominer` use external ressources, the path to these resspources 
87 935a568c Florent Chuffart
are describe in `BOWTIE_BUILD_BIN`, `BOWTIE2_BIN`, `SAMTOOLS_BIN`, 
88 935a568c Florent Chuffart
`BEDTOOLS_BIN` and `TF_BIN` and `TF_TEMPLATES_FILE`.
89 935a568c Florent Chuffart
90 935a568c Florent Chuffart
All paths, prefixes and indexes could be change in the 
91 935a568c Florent Chuffart
`src/nucleo_miner/nucleo_miner_config.json` file.
92 935a568c Florent Chuffart
93 935a568c Florent Chuffart
.. autodata:: wf.json_conf_file
94 935a568c Florent Chuffart
    :noindex: 
95 935a568c Florent Chuffart
96 935a568c Florent Chuffart
97 935a568c Florent Chuffart
Preprocessing Illumina Fastq Reads for Each Sample
98 935a568c Florent Chuffart
--------------------------------------------------
99 935a568c Florent Chuffart
100 935a568c Florent Chuffart
This preprocessing step consists in the 4 main steps embed in the `wf.py` and 
101 935a568c Florent Chuffart
described bellow. As a preamble, this script computes `samples` `samples_mnase` 
102 935a568c Florent Chuffart
and `strains` that will be used along the 4 steps.
103 935a568c Florent Chuffart
104 935a568c Florent Chuffart
.. autodata:: wf.samples
105 935a568c Florent Chuffart
    :noindex: 
106 935a568c Florent Chuffart
107 935a568c Florent Chuffart
.. autodata:: wf.samples_mnase
108 935a568c Florent Chuffart
    :noindex: 
109 935a568c Florent Chuffart
110 935a568c Florent Chuffart
.. autodata:: wf.strains
111 935a568c Florent Chuffart
    :noindex: 
112 935a568c Florent Chuffart
113 935a568c Florent Chuffart
114 935a568c Florent Chuffart
Creating Bowtie Index from each Reference Genome
115 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
116 935a568c Florent Chuffart
117 935a568c Florent Chuffart
For each strain, we need to create bowtie index. Bowtie index of a strain is a 
118 935a568c Florent Chuffart
tree view of the genemoe reference for this strain. It will be used by 
119 935a568c Florent Chuffart
bowtie to align reads. This step is performed by the following part of the 
120 935a568c Florent Chuffart
`wf.py` script:
121 935a568c Florent Chuffart
122 935a568c Florent Chuffart
.. literalinclude:: ../../../snep/src/nucleo_miner/wf.py
123 935a568c Florent Chuffart
   :start-after: # _STARTOF_ step_1
124 935a568c Florent Chuffart
   :end-before: # _ENDOF_ step_1
125 935a568c Florent Chuffart
   :language: python
126 935a568c Florent Chuffart
127 935a568c Florent Chuffart
The following table sum up involved file sizes and process durations concerning 
128 935a568c Florent Chuffart
this step.
129 935a568c Florent Chuffart
130 935a568c Florent Chuffart
======  ======================  ======================  ================
131 935a568c Florent Chuffart
strain  fasta genome file size  bowtie index file size  process duration
132 935a568c Florent Chuffart
======  ======================  ======================  ================
133 935a568c Florent Chuffart
BY      12 Mo                          25 Mo                    11 s.
134 935a568c Florent Chuffart
RM      12 Mo                          24 Mo                    9 s.
135 935a568c Florent Chuffart
YJM     12 Mo                          25 Mo                    11 s.
136 935a568c Florent Chuffart
======  ======================  ======================  ================
137 935a568c Florent Chuffart
138 935a568c Florent Chuffart
139 935a568c Florent Chuffart
140 935a568c Florent Chuffart
141 935a568c Florent Chuffart
Aligning Reads to Reference Genome 
142 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
143 935a568c Florent Chuffart
144 935a568c Florent Chuffart
Next, we launch bowtie to align reads to the reference genome. It produces a 
145 935a568c Florent Chuffart
`.sam` file that we convert into a `.bed` file. Binaries for `bowtie`, `samtools` 
146 935a568c Florent Chuffart
and `bedtools` are wrapped using python `subprocess` class. This step is 
147 935a568c Florent Chuffart
performed by the followinw part of the `wf.py` script:
148 935a568c Florent Chuffart
149 935a568c Florent Chuffart
.. literalinclude:: ../../../snep/src/nucleo_miner/wf.py
150 935a568c Florent Chuffart
   :start-after: # _STARTOF_ step_2
151 935a568c Florent Chuffart
   :end-before: # _ENDOF_ step_2
152 935a568c Florent Chuffart
   :language: python
153 935a568c Florent Chuffart
154 935a568c Florent Chuffart
Convert Aligned Reads for TemplateFilter
155 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
156 935a568c Florent Chuffart
TemplateFilter use particular input format for reads, so we convert `.bed` file. 
157 935a568c Florent Chuffart
TemplateFilter expect reads as following: `chr coord strand #read` where:
158 935a568c Florent Chuffart
159 935a568c Florent Chuffart
- chr is the number of the chromosome;
160 935a568c Florent Chuffart
- coord is the coordinate of the reads;
161 935a568c Florent Chuffart
- strand is `F` for forward and `R` for reverse;
162 935a568c Florent Chuffart
- #reads the number of reads for this position.
163 935a568c Florent Chuffart
164 935a568c Florent Chuffart
Each entry is *tab*-separated.
165 935a568c Florent Chuffart
166 935a568c Florent Chuffart
**WARNING** for reverse strand bowtie returns the position of left first 
167 935a568c Florent Chuffart
nucleotid when TemplateFilter is waiting for right one. So this step takes it 
168 935a568c Florent Chuffart
into account and add lenght of reads (in our case 50) to reverse reads 
169 935a568c Florent Chuffart
coordinate.
170 935a568c Florent Chuffart
171 935a568c Florent Chuffart
This step is performed by the followinw part of the `wf.py` script:
172 935a568c Florent Chuffart
173 935a568c Florent Chuffart
.. literalinclude:: ../../../snep/src/nucleo_miner/wf.py
174 935a568c Florent Chuffart
   :start-after: # _STARTOF_ step_3
175 935a568c Florent Chuffart
   :end-before: # _ENDOF_ step_3
176 935a568c Florent Chuffart
   :language: python
177 935a568c Florent Chuffart
178 935a568c Florent Chuffart
The following table sum up number of reads, involved file sizes and process durations concerning 
179 935a568c Florent Chuffart
the two last steps. In our case, aligment process have been multuthreaded over over 3 cores.
180 935a568c Florent Chuffart
181 935a568c Florent Chuffart
==  ==============  =========================  ======  ================  ==================  ================  
182 935a568c Florent Chuffart
id  Illumina reads  aligned and filtred reads  ratio   `.bed` file size  TF input file size  process duration
183 935a568c Florent Chuffart
==  ==============  =========================  ======  ================  ==================  ================
184 935a568c Florent Chuffart
1   16436138        10199695                   62,06%  1064 Mo           60  Mo              383   s.
185 935a568c Florent Chuffart
2   16911132        12512727                   73,99%  1298 Mo           64  Mo              437   s.
186 935a568c Florent Chuffart
3   15946902        12340426                   77,38%  1280 Mo           65  Mo              423   s.
187 935a568c Florent Chuffart
4   13765584        10381903                   75,42%  931  Mo           59  Mo              352   s.
188 935a568c Florent Chuffart
5   15168268        11502855                   75,83%  1031 Mo           64  Mo              386   s.
189 935a568c Florent Chuffart
6   18850820        14024905                   74,40%  1254 Mo           69  Mo              482   s.
190 935a568c Florent Chuffart
7   15591124        12126623                   77,78%  1163 Mo           72  Mo              405   s.
191 935a568c Florent Chuffart
8   15659905        12475664                   79,67%  1194 Mo           71  Mo              416   s.
192 935a568c Florent Chuffart
9   14668641        10960565                   74,72%  1052 Mo           70  Mo              375   s.
193 935a568c Florent Chuffart
10  14339179        10454451                   72,91%  1049 Mo           51  Mo              363   s.
194 935a568c Florent Chuffart
11  18019895        13688774                   75,96%  1378 Mo           59  Mo              474   s.
195 935a568c Florent Chuffart
12  13746796        10810022                   78,64%  1084 Mo           54  Mo              360   s.
196 935a568c Florent Chuffart
13  15205065        11766016                   77,38%  990  Mo           54  Mo              381   s.
197 935a568c Florent Chuffart
14  17803097        13838883                   77,73%  1154 Mo           60  Mo              452   s.
198 935a568c Florent Chuffart
15  15434564        12307878                   79,74%  1032 Mo           57  Mo              394   s.
199 935a568c Florent Chuffart
16  16802587        12725665                   75,74%  1221 Mo           48  Mo              438   s.
200 935a568c Florent Chuffart
17  16058417        12513734                   77,93%  1192 Mo           63  Mo              422   s.
201 935a568c Florent Chuffart
18  16154482        13204331                   81,74%  1277 Mo           52  Mo              430   s.
202 935a568c Florent Chuffart
19  21013924        17102120                   81,38%  1646 Mo           59  Mo              555   s.
203 935a568c Florent Chuffart
20  17213114        14433357                   83,85%  1389 Mo           53  Mo              459   s.
204 935a568c Florent Chuffart
21  17360907        14733001                   84,86%  1203 Mo           55  Mo              450   s.
205 935a568c Florent Chuffart
22  18136816        15389581                   84,85%  1257 Mo           53  Mo              469   s.
206 935a568c Florent Chuffart
23  14763678        12173025                   82,45%  1140 Mo           56  Mo              393   s.
207 935a568c Florent Chuffart
24  15541709        12890345                   82,94%  1057 Mo           48  Mo              398   s.
208 935a568c Florent Chuffart
25  16433215        13094314                   79,68%  1241 Mo           57  Mo              433   s.
209 935a568c Florent Chuffart
26  17370850        14264136                   82,12%  1347 Mo           51  Mo              466   s.
210 935a568c Florent Chuffart
27  14613512        8654495                    59,22%  887  Mo           56  Mo              339   s.
211 935a568c Florent Chuffart
28  15248545        11367589                   74,55%  1166 Mo           67  Mo              405   s.
212 935a568c Florent Chuffart
29  14316809        10767926                   75,21%  1103 Mo           63  Mo              379   s.
213 935a568c Florent Chuffart
30  15178058        12265794                   80,81%  1030 Mo           66  Mo              390   s.
214 935a568c Florent Chuffart
31  14968579        11876186                   79,34%  1009 Mo           63  Mo              387   s.
215 935a568c Florent Chuffart
32  16912705        13550508                   80,12%  1143 Mo           70  Mo              442   s.
216 935a568c Florent Chuffart
33  16782154        12755111                   76,00%  1227 Mo           65  Mo              438   s.
217 935a568c Florent Chuffart
34  16741443        13168071                   78,66%  1260 Mo           71  Mo              442   s.
218 935a568c Florent Chuffart
35  13096171        10367041                   79,16%  992  Mo           62  Mo              350   s.
219 935a568c Florent Chuffart
36  17715118        14092985                   79,55%  1404 Mo           68  Mo              483   s.
220 935a568c Florent Chuffart
37  17288466        7402082                    42,82%  741  Mo           48  Mo              339   s.
221 935a568c Florent Chuffart
38  16116394        13178457                   81,77%  1101 Mo           63  Mo              420   s.
222 935a568c Florent Chuffart
39  14241106        10537228                   73,99%  880  Mo           57  Mo              348   s.
223 935a568c Florent Chuffart
40  13784738        10598464                   76,89%  1005 Mo           64  Mo              358   s.
224 935a568c Florent Chuffart
41  12438007        9620975                    77,35%  911  Mo           60  Mo              326   s.
225 935a568c Florent Chuffart
42  13853959        11031238                   79,63%  1045 Mo           64  Mo              365   s.
226 935a568c Florent Chuffart
43  12036162        6654780                    55,29%  684  Mo           46  Mo              268   s.
227 935a568c Florent Chuffart
44  13873129        10251074                   73,89%  1048 Mo           61  Mo              365   s.
228 935a568c Florent Chuffart
45  19817751        14904502                   75,21%  1520 Mo           72  Mo              528   s.
229 935a568c Florent Chuffart
46  13368959        10818619                   80,92%  912  Mo           63  Mo              350   s.
230 935a568c Florent Chuffart
47  7566467         6139001                    81,13%  520  Mo           44  Mo              201   s.
231 935a568c Florent Chuffart
48  32586928        21191363                   65,03%  1816 Mo           82  Mo              766   s.
232 935a568c Florent Chuffart
49  30733184        18791373                   61,14%  1801 Mo           89  Mo              721   s.
233 935a568c Florent Chuffart
50  41287616        30383875                   73,59%  2911 Mo           112 Mo              1065  s.
234 935a568c Florent Chuffart
51  40439965        31177914                   77,10%  2981 Mo           117 Mo              1070  s.
235 935a568c Florent Chuffart
53  40876476        33780065                   82,64%  3316 Mo           103 Mo              1165  s.
236 935a568c Florent Chuffart
55  52424414        47117107                   89,88%  3811 Mo           119 Mo              1477  s.
237 935a568c Florent Chuffart
==  ==============  =========================  ======  ================  ==================  ================  
238 935a568c Florent Chuffart
239 935a568c Florent Chuffart
For some reasons (manipulation efficency, e.g. PCR...), we remove samples 33, 45, 48 and 55.
240 935a568c Florent Chuffart
241 935a568c Florent Chuffart
242 935a568c Florent Chuffart
Run TemplateFilter on Mnase Samples
243 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
244 935a568c Florent Chuffart
245 935a568c Florent Chuffart
Finally, for each sample we perfome TemplateFilter analysis. 
246 935a568c Florent Chuffart
247 935a568c Florent Chuffart
**WARNING** TemplateFilter returns a list of nucleosomes. Each nucleosome is 
248 935a568c Florent Chuffart
define by its center and its width. An odd width leads us to considere non 
249 935a568c Florent Chuffart
interger lower and upper bound.
250 935a568c Florent Chuffart
251 935a568c Florent Chuffart
**WARNING** TemplateFilter is not design to deal with replicate. So we choose to 
252 935a568c Florent Chuffart
keep a maximum of nucleosome and filter in a second time using the benefit of 
253 935a568c Florent Chuffart
replicate. To do that we set a low correlation threshold parameter (`0.5`) and a 
254 935a568c Florent Chuffart
particularly high value of overlaping (`300%`).
255 935a568c Florent Chuffart
256 935a568c Florent Chuffart
This step is performed by the followinw part of the `wf.py` script:
257 935a568c Florent Chuffart
258 935a568c Florent Chuffart
.. literalinclude:: ../../../snep/src/nucleo_miner/wf.py
259 935a568c Florent Chuffart
   :start-after: # _STARTOF_ step_4
260 935a568c Florent Chuffart
   :end-before: # _ENDOF_ step_4
261 935a568c Florent Chuffart
   :language: python
262 935a568c Florent Chuffart
263 935a568c Florent Chuffart
==  ======  ==========  =============  ================
264 935a568c Florent Chuffart
id  strain  found nucs  nuc file size  process duration
265 935a568c Florent Chuffart
==  ======  ==========  =============  ================
266 935a568c Florent Chuffart
1    BY     96214       68 Mo          1022 s.                     
267 935a568c Florent Chuffart
2    BY     91694       65 Mo          1038 s.                      
268 935a568c Florent Chuffart
3    BY     91205       65 Mo          1036 s.                       
269 935a568c Florent Chuffart
4    RM     88076       62 Mo          984 s.                      
270 935a568c Florent Chuffart
5    RM     90141       64 Mo          967 s.                      
271 935a568c Florent Chuffart
6    RM     87517       62 Mo          980 s.                      
272 935a568c Florent Chuffart
7    YJM    88945       64 Mo          566 s.                      
273 935a568c Florent Chuffart
8    YJM    88689       64 Mo          570 s.                      
274 935a568c Florent Chuffart
9    YJM    88128       63 Mo          565 s.                    
275 935a568c Florent Chuffart
==  ======  ==========  =============  ================
276 935a568c Florent Chuffart
277 935a568c Florent Chuffart
278 935a568c Florent Chuffart
279 935a568c Florent Chuffart
280 935a568c Florent Chuffart
281 935a568c Florent Chuffart
282 935a568c Florent Chuffart
283 935a568c Florent Chuffart
284 935a568c Florent Chuffart
285 935a568c Florent Chuffart
286 935a568c Florent Chuffart
287 935a568c Florent Chuffart
288 935a568c Florent Chuffart
289 935a568c Florent Chuffart
Inferring Nucleosome Position and Extracting Read Counts
290 935a568c Florent Chuffart
--------------------------------------------------------
291 935a568c Florent Chuffart
292 935a568c Florent Chuffart
293 935a568c Florent Chuffart
294 935a568c Florent Chuffart
This preprocessing step consists in the 4 main steps embed in the `wf.py` and 
295 935a568c Florent Chuffart
described bellow. As a preamble, this script computes `samples` `samples_mnase` 
296 935a568c Florent Chuffart
and `strains` that will be used along the 4 steps.
297 935a568c Florent Chuffart
298 935a568c Florent Chuffart
299 935a568c Florent Chuffart
The second part of the tutoriel use `R` (http://http://www.r-project.org). It 
300 935a568c Florent Chuffart
consists in the 3 main steps corresponding to 4 R scripts:
301 935a568c Florent Chuffart
302 935a568c Florent Chuffart
  - compute_rois.R
303 935a568c Florent Chuffart
  - extract_maps.R
304 935a568c Florent Chuffart
  - count_reads.R
305 935a568c Florent Chuffart
  - get_size_factors  
306 935a568c Florent Chuffart
  - launch_deseq.R
307 935a568c Florent Chuffart
308 935a568c Florent Chuffart
Computing Common Genome Region Between Strains
309 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
310 935a568c Florent Chuffart
311 935a568c Florent Chuffart
.. code:: bash
312 935a568c Florent Chuffart
313 935a568c Florent Chuffart
  R CMD BATCH src/nucleo_miner/compute_rois.R
314 935a568c Florent Chuffart
315 935a568c Florent Chuffart
316 935a568c Florent Chuffart
Extracting Maps for Well Positionned and Fuzzy Nucleosomes
317 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
318 935a568c Florent Chuffart
319 935a568c Florent Chuffart
.. code:: bash
320 935a568c Florent Chuffart
321 935a568c Florent Chuffart
  R CMD BATCH src/nucleo_miner/extract_maps.R
322 935a568c Florent Chuffart
323 935a568c Florent Chuffart
324 935a568c Florent Chuffart
Count Reads for Each Nucleosome
325 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
326 935a568c Florent Chuffart
327 935a568c Florent Chuffart
.. code:: bash
328 935a568c Florent Chuffart
329 935a568c Florent Chuffart
  R CMD BATCH src/nucleo_miner/count_reads.R
330 935a568c Florent Chuffart
331 935a568c Florent Chuffart
332 935a568c Florent Chuffart
Get Size Factors Using DESeq
333 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
334 935a568c Florent Chuffart
335 935a568c Florent Chuffart
.. code:: bash
336 935a568c Florent Chuffart
337 935a568c Florent Chuffart
  R CMD BATCH src/nucleo_miner/get_size_factors.R
338 935a568c Florent Chuffart
339 935a568c Florent Chuffart
340 935a568c Florent Chuffart
Performing DESeq Analysis
341 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^
342 935a568c Florent Chuffart
343 935a568c Florent Chuffart
.. code:: bash
344 935a568c Florent Chuffart
345 935a568c Florent Chuffart
  R CMD BATCH src/nucleo_miner/launch_deseq.R
346 935a568c Florent Chuffart
347 935a568c Florent Chuffart
348 935a568c Florent Chuffart
Results
349 935a568c Florent Chuffart
-------
350 935a568c Florent Chuffart
351 935a568c Florent Chuffart
Output Files Organisation
352 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^
353 935a568c Florent Chuffart
Previous steps produce following 45 files. 
354 935a568c Florent Chuffart
Each filename is under the form 
355 935a568c Florent Chuffart
356 935a568c Florent Chuffart
.. code:: bash
357 935a568c Florent Chuffart
358 935a568c Florent Chuffart
  results/nucleo_miner/[combi]_[marker]_[form]_snep.tab 
359 935a568c Florent Chuffart
360 935a568c Florent Chuffart
Where combi is in {BY_RM, BY_YJM, RM_YJM} for each strain combination, marker is 
361 935a568c Florent Chuffart
in {H3K4me1, H3K4me3, H3K9ac, H3K14ac, H4K12ac} for each post translational 
362 935a568c Florent Chuffart
histone modification and form is in {wp, fuzzy, wpfuzzy} considering well 
363 935a568c Florent Chuffart
positionned nucleosomes, fuzzy nucleosomes or both for SNEP computation.
364 935a568c Florent Chuffart
365 935a568c Florent Chuffart
366 935a568c Florent Chuffart
367 935a568c Florent Chuffart
chr_BY lower_bound_BY upper_bound_BY index_nuc_BY chr_RM lower_bound_RM upper_bound_RM index_nuc_RM roi_index form BY_Mnase_Seq_1 BY_Mnase_Seq_2 BY_Mnase_Seq_3 RM_Mnase_Seq_4 RM_Mnase_Seq_5 RM_Mnase_Seq_6 BY_H3K14ac_36 
368 935a568c Florent Chuffart
BY_H3K14ac_37 BY_H3K14ac_53 RM_H3K14ac_38 RM_H3K14ac_39 pvalsGLM 
369 935a568c Florent Chuffart
370 935a568c Florent Chuffart
For each file, there is 1 line per nucleosome and each line is composed of many columns divided into 3 main topics:
371 935a568c Florent Chuffart
  - nuc information
372 935a568c Florent Chuffart
  - number opf reads for each sample
373 935a568c Florent Chuffart
  - DESeq analysis results.
374 935a568c Florent Chuffart
375 935a568c Florent Chuffart
For exemple for the file *BY_RM_H3K14ac_wp_snep.tab* informations are: 
376 935a568c Florent Chuffart
  - chr_BY, the BY chr involved
377 935a568c Florent Chuffart
  - lower_bound_BY, the lower bound of the BY nuc
378 935a568c Florent Chuffart
  - upper_bound_BY, the upper_bound of the BY nuc
379 935a568c Florent Chuffart
  - index_nuc_BY, the index of the nuc in the entire list of BY nucs
380 935a568c Florent Chuffart
  - chr_RM, lower_bound_RM, upper_bound_RM, index_nuc_RM 
381 935a568c Florent Chuffart
	are the same information for the RM strain
382 935a568c Florent Chuffart
  - roi_index, the index of the region of interrest involved.
383 935a568c Florent Chuffart
  
384 935a568c Florent Chuffart
Next cols concern indicators for each sample. They are labeled [strain]_[marker]_[sample_id] and each value represents the number of reads for the current nuc for the sample *sample_id*. 
385 935a568c Florent Chuffart
386 935a568c Florent Chuffart
The 5 final columns concern DESeq analysis:
387 935a568c Florent Chuffart
  - manip[a_manip] strain[a_strain] manip[a_strain]:strain[a_strain], the manip (marker) effect, the strain effect and the snep effect.  
388 935a568c Florent Chuffart
  - pvalsGLM, the pvalue resulting of the comparison of the GLM model considering or the interaction term *marker:strain* 
389 935a568c Florent Chuffart
  - snep_index, a boolean set to TRUE if the *pvalueGLM* value is under the threshold computed with FDR function with a rate set to 0.01%. 
390 935a568c Florent Chuffart
391 935a568c Florent Chuffart
It also produces the file that explicts size factor for each involved sample in differents strain combination and nucleosomal region type:
392 935a568c Florent Chuffart
393 935a568c Florent Chuffart
TODO: include this file... /home/filleton/analyses/snepcatalog/data/2013-10-09/nucleo_miner/README.txt
394 935a568c Florent Chuffart
395 935a568c Florent Chuffart
396 935a568c Florent Chuffart
.. code:: bash
397 935a568c Florent Chuffart
398 935a568c Florent Chuffart
  results/nucleo_miner/size_factors.tab
399 935a568c Florent Chuffart
400 935a568c Florent Chuffart
401 935a568c Florent Chuffart
402 935a568c Florent Chuffart
403 935a568c Florent Chuffart
Number of SNEPs
404 935a568c Florent Chuffart
^^^^^^^^^^^^^^^
405 935a568c Florent Chuffart
406 935a568c Florent Chuffart
Here are the number of computed for each forms.
407 935a568c Florent Chuffart
408 935a568c Florent Chuffart
.. code:: bash
409 935a568c Florent Chuffart
410 935a568c Florent Chuffart
  [1] "wp"
411 935a568c Florent Chuffart
         #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
412 935a568c Florent Chuffart
  BY-RM  30234     520     798     83    3566      26
413 935a568c Florent Chuffart
  BY-YJM 31298     303     619    102     103     128
414 935a568c Florent Chuffart
  RM-YJM 29863     129     340     46    3177      18
415 935a568c Florent Chuffart
  [1] "fuzzy"
416 935a568c Florent Chuffart
         #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
417 935a568c Florent Chuffart
  BY-RM  10748     294     308    101    1681      42
418 935a568c Florent Chuffart
  BY-YJM 10669     122     176    124      93      87
419 935a568c Florent Chuffart
  RM-YJM 11478      54     112     41    1389      20
420 935a568c Florent Chuffart
  [1] "wpfuzzy"
421 935a568c Florent Chuffart
         #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
422 935a568c Florent Chuffart
  BY-RM  40982     770    1136    183    5404      73
423 935a568c Florent Chuffart
  BY-YJM 41967     439     804    214     198     199
424 935a568c Florent Chuffart
  RM-YJM 41341     184     468     87    4687      37
425 935a568c Florent Chuffart
426 935a568c Florent Chuffart
427 935a568c Florent Chuffart
TODO: 
428 935a568c Florent Chuffart
  - Print/study intra/inter strain LODs.
429 935a568c Florent Chuffart
  - Check the normality of sample using Shapiro–Wilk (Hypothesis for computing LODs)
430 935a568c Florent Chuffart