root / doc / sphinx_doc / tuto.rst @ ec2936ea
Historique | Voir | Annoter | Télécharger (18,66 ko)
1 | 935a568c | Florent Chuffart | Tutorial |
---|---|---|---|
2 | 935a568c | Florent Chuffart | ======== |
3 | 935a568c | Florent Chuffart | |
4 | 935a568c | Florent Chuffart | This tutorial describes steps allowing to perform quantitave analysis of |
5 | 935a568c | Florent Chuffart | nucleosomal epigenome. We assume that files are organised around a given |
6 | 935a568c | Florent Chuffart | hierarchie and that all command lines are launched from project's root. |
7 | 935a568c | Florent Chuffart | |
8 | 935a568c | Florent Chuffart | This tutorial is divided into t=wo main parts. First one consists in the python |
9 | 935a568c | Florent Chuffart | script `wf.py` that aligns and convert Illumina reads. Second one is the R |
10 | 935a568c | Florent Chuffart | script `main.r` that extracts information (nucleosome position and indicators) |
11 | 935a568c | Florent Chuffart | from the dataset. |
12 | 935a568c | Florent Chuffart | |
13 | 935a568c | Florent Chuffart | |
14 | 935a568c | Florent Chuffart | Dataset and Configuration File |
15 | 935a568c | Florent Chuffart | ------------------------------ |
16 | 935a568c | Florent Chuffart | |
17 | 935a568c | Florent Chuffart | We want to compare nucleosomes of 3 yeast strains: |
18 | 935a568c | Florent Chuffart | |
19 | 935a568c | Florent Chuffart | - BY |
20 | 935a568c | Florent Chuffart | - RM |
21 | 935a568c | Florent Chuffart | - YJM |
22 | 935a568c | Florent Chuffart | |
23 | 935a568c | Florent Chuffart | For each strain we perform Mnase-Seq and ChIP-Seq using the 5 following |
24 | 935a568c | Florent Chuffart | markers: |
25 | 935a568c | Florent Chuffart | |
26 | 935a568c | Florent Chuffart | - H3K4me1 |
27 | 935a568c | Florent Chuffart | - H3K4me3 |
28 | 935a568c | Florent Chuffart | - H3K9ac |
29 | 935a568c | Florent Chuffart | - H3K14ac |
30 | 935a568c | Florent Chuffart | - H4K12ac |
31 | 935a568c | Florent Chuffart | |
32 | 935a568c | Florent Chuffart | In order to simplify the design of exeriment, we considere Mnase as a marker. |
33 | 935a568c | Florent Chuffart | For each couple `(strain, marker)` we perform 3 replicates. So, theoritically |
34 | 935a568c | Florent Chuffart | we should have `3 * (1 + 5) * 3 = 54` samples. In practice we only obtain 2 |
35 | 935a568c | Florent Chuffart | replicates for `(YJM, H3K4me1)`. Each one of the 53 samples is indentify by a |
36 | 935a568c | Florent Chuffart | uniq identifier. The file `CSV_SAMPLE_FILE` sums up this information. |
37 | 935a568c | Florent Chuffart | |
38 | 935a568c | Florent Chuffart | .. autodata:: configurator.CSV_SAMPLE_FILE |
39 | 935a568c | Florent Chuffart | :noindex: |
40 | 935a568c | Florent Chuffart | |
41 | 935a568c | Florent Chuffart | We use a convention to link sample and Illumina fastq outputs. Illumina output |
42 | 935a568c | Florent Chuffart | files of the sample `ID` will be stored in the directory |
43 | 935a568c | Florent Chuffart | `ILLUMINA_OUTPUTFILE_PREFIX` + `ID`. For example, sample 41 outputs will be |
44 | 935a568c | Florent Chuffart | stored in the directory `data/2012-09-05/FASTQ/Sample_Yvert_Bq41/`. |
45 | 935a568c | Florent Chuffart | |
46 | 935a568c | Florent Chuffart | .. autodata:: configurator.ILLUMINA_OUTPUTFILE_PREFIX |
47 | 935a568c | Florent Chuffart | :noindex: |
48 | 935a568c | Florent Chuffart | |
49 | 935a568c | Florent Chuffart | For BY (resp. RM and YJM) we use following reference genome |
50 | 935a568c | Florent Chuffart | `saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta` |
51 | 935a568c | Florent Chuffart | (resp. `saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta` and |
52 | 935a568c | Florent Chuffart | `saccharomyces_cerevisiae_YJM_789_screencontig.fasta`). |
53 | 935a568c | Florent Chuffart | The index `FASTA_REFERENCE_GENOME_FILES` stores this information. |
54 | 935a568c | Florent Chuffart | |
55 | 935a568c | Florent Chuffart | .. autodata:: configurator.FASTA_REFERENCE_GENOME_FILES |
56 | 935a568c | Florent Chuffart | :noindex: |
57 | 935a568c | Florent Chuffart | |
58 | 935a568c | Florent Chuffart | Each chromosome/contig is identify in the fasta file by an obscure identifier. |
59 | 935a568c | Florent Chuffart | For example, BY chromosome I is identify by `gi|144228165|ref|NC_001133.7|` when |
60 | 935a568c | Florent Chuffart | TemplateFilter is waiting for an integer. So, we translate it. The index |
61 | 935a568c | Florent Chuffart | `FASTA_INDEXES` stores this translation. |
62 | 935a568c | Florent Chuffart | |
63 | 935a568c | Florent Chuffart | .. autodata:: configurator.FASTA_INDEXES |
64 | 935a568c | Florent Chuffart | :noindex: |
65 | 935a568c | Florent Chuffart | |
66 | 935a568c | Florent Chuffart | From a pragamatical point of view we discard some part of the genome (repeated |
67 | 935a568c | Florent Chuffart | sequence etc...). The list of the black listed area is explicitely detailled in |
68 | 935a568c | Florent Chuffart | `AREA_BLACK_LIST`. |
69 | 935a568c | Florent Chuffart | |
70 | 935a568c | Florent Chuffart | .. autodata:: configurator.AREA_BLACK_LIST |
71 | 935a568c | Florent Chuffart | :noindex: |
72 | 935a568c | Florent Chuffart | |
73 | 935a568c | Florent Chuffart | For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use previously |
74 | 935a568c | Florent Chuffart | compute .c2c file `data/2012-03_primarydata/BY_RM_gxcomp.c2c` (resp. |
75 | 935a568c | Florent Chuffart | `BY_YJM_GComp_All.c2c` and `RM_YJM_gxcomp.c2c`). For more information about |
76 | 935a568c | Florent Chuffart | .c2c files, please read section 5 of the manual of `NucleoMiner`, the old |
77 | 935a568c | Florent Chuffart | version of `NucleoMiner2` |
78 | 935a568c | Florent Chuffart | (http://www.ens-lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf). |
79 | 935a568c | Florent Chuffart | |
80 | 935a568c | Florent Chuffart | .. autodata:: configurator.C2C_FILES |
81 | 935a568c | Florent Chuffart | :noindex: |
82 | 935a568c | Florent Chuffart | |
83 | 935a568c | Florent Chuffart | `nucleominer` uses specific directory to work in, these are described in |
84 | 935a568c | Florent Chuffart | `INDEX_DIR`, `ALIGN_DIR` and `LOG_DIR`. |
85 | 935a568c | Florent Chuffart | |
86 | 935a568c | Florent Chuffart | Finally, `nucleominer` use external ressources, the path to these resspources |
87 | 935a568c | Florent Chuffart | are describe in `BOWTIE_BUILD_BIN`, `BOWTIE2_BIN`, `SAMTOOLS_BIN`, |
88 | 935a568c | Florent Chuffart | `BEDTOOLS_BIN` and `TF_BIN` and `TF_TEMPLATES_FILE`. |
89 | 935a568c | Florent Chuffart | |
90 | 935a568c | Florent Chuffart | All paths, prefixes and indexes could be change in the |
91 | 8e9facd8 | Florent Chuffart | `src/current/nucleominer_config.json` file. |
92 | 935a568c | Florent Chuffart | |
93 | 935a568c | Florent Chuffart | .. autodata:: wf.json_conf_file |
94 | 935a568c | Florent Chuffart | :noindex: |
95 | 935a568c | Florent Chuffart | |
96 | 935a568c | Florent Chuffart | |
97 | 935a568c | Florent Chuffart | Preprocessing Illumina Fastq Reads for Each Sample |
98 | 935a568c | Florent Chuffart | -------------------------------------------------- |
99 | 935a568c | Florent Chuffart | |
100 | 935a568c | Florent Chuffart | This preprocessing step consists in the 4 main steps embed in the `wf.py` and |
101 | 935a568c | Florent Chuffart | described bellow. As a preamble, this script computes `samples` `samples_mnase` |
102 | 935a568c | Florent Chuffart | and `strains` that will be used along the 4 steps. |
103 | 935a568c | Florent Chuffart | |
104 | 935a568c | Florent Chuffart | .. autodata:: wf.samples |
105 | 935a568c | Florent Chuffart | :noindex: |
106 | 935a568c | Florent Chuffart | |
107 | 935a568c | Florent Chuffart | .. autodata:: wf.samples_mnase |
108 | 935a568c | Florent Chuffart | :noindex: |
109 | 935a568c | Florent Chuffart | |
110 | 935a568c | Florent Chuffart | .. autodata:: wf.strains |
111 | 935a568c | Florent Chuffart | :noindex: |
112 | 935a568c | Florent Chuffart | |
113 | 935a568c | Florent Chuffart | |
114 | 935a568c | Florent Chuffart | Creating Bowtie Index from each Reference Genome |
115 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
116 | 935a568c | Florent Chuffart | |
117 | 935a568c | Florent Chuffart | For each strain, we need to create bowtie index. Bowtie index of a strain is a |
118 | 935a568c | Florent Chuffart | tree view of the genemoe reference for this strain. It will be used by |
119 | 935a568c | Florent Chuffart | bowtie to align reads. This step is performed by the following part of the |
120 | 935a568c | Florent Chuffart | `wf.py` script: |
121 | 935a568c | Florent Chuffart | |
122 | 8e9facd8 | Florent Chuffart | .. literalinclude:: ../../../snep/src/current/wf.py |
123 | 935a568c | Florent Chuffart | :start-after: # _STARTOF_ step_1 |
124 | 935a568c | Florent Chuffart | :end-before: # _ENDOF_ step_1 |
125 | 935a568c | Florent Chuffart | :language: python |
126 | 935a568c | Florent Chuffart | |
127 | 935a568c | Florent Chuffart | The following table sum up involved file sizes and process durations concerning |
128 | 935a568c | Florent Chuffart | this step. |
129 | 935a568c | Florent Chuffart | |
130 | 935a568c | Florent Chuffart | ====== ====================== ====================== ================ |
131 | 935a568c | Florent Chuffart | strain fasta genome file size bowtie index file size process duration |
132 | 935a568c | Florent Chuffart | ====== ====================== ====================== ================ |
133 | 935a568c | Florent Chuffart | BY 12 Mo 25 Mo 11 s. |
134 | 935a568c | Florent Chuffart | RM 12 Mo 24 Mo 9 s. |
135 | 935a568c | Florent Chuffart | YJM 12 Mo 25 Mo 11 s. |
136 | 935a568c | Florent Chuffart | ====== ====================== ====================== ================ |
137 | 935a568c | Florent Chuffart | |
138 | 935a568c | Florent Chuffart | |
139 | 935a568c | Florent Chuffart | |
140 | 935a568c | Florent Chuffart | |
141 | 935a568c | Florent Chuffart | Aligning Reads to Reference Genome |
142 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
143 | 935a568c | Florent Chuffart | |
144 | 935a568c | Florent Chuffart | Next, we launch bowtie to align reads to the reference genome. It produces a |
145 | 935a568c | Florent Chuffart | `.sam` file that we convert into a `.bed` file. Binaries for `bowtie`, `samtools` |
146 | 935a568c | Florent Chuffart | and `bedtools` are wrapped using python `subprocess` class. This step is |
147 | 935a568c | Florent Chuffart | performed by the followinw part of the `wf.py` script: |
148 | 935a568c | Florent Chuffart | |
149 | 8e9facd8 | Florent Chuffart | .. literalinclude:: ../../../snep/src/current/wf.py |
150 | 935a568c | Florent Chuffart | :start-after: # _STARTOF_ step_2 |
151 | 935a568c | Florent Chuffart | :end-before: # _ENDOF_ step_2 |
152 | 935a568c | Florent Chuffart | :language: python |
153 | 935a568c | Florent Chuffart | |
154 | 935a568c | Florent Chuffart | Convert Aligned Reads for TemplateFilter |
155 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
156 | 935a568c | Florent Chuffart | TemplateFilter use particular input format for reads, so we convert `.bed` file. |
157 | 935a568c | Florent Chuffart | TemplateFilter expect reads as following: `chr coord strand #read` where: |
158 | 935a568c | Florent Chuffart | |
159 | 935a568c | Florent Chuffart | - chr is the number of the chromosome; |
160 | 935a568c | Florent Chuffart | - coord is the coordinate of the reads; |
161 | 935a568c | Florent Chuffart | - strand is `F` for forward and `R` for reverse; |
162 | 935a568c | Florent Chuffart | - #reads the number of reads for this position. |
163 | 935a568c | Florent Chuffart | |
164 | 935a568c | Florent Chuffart | Each entry is *tab*-separated. |
165 | 935a568c | Florent Chuffart | |
166 | 935a568c | Florent Chuffart | **WARNING** for reverse strand bowtie returns the position of left first |
167 | 935a568c | Florent Chuffart | nucleotid when TemplateFilter is waiting for right one. So this step takes it |
168 | 935a568c | Florent Chuffart | into account and add lenght of reads (in our case 50) to reverse reads |
169 | 935a568c | Florent Chuffart | coordinate. |
170 | 935a568c | Florent Chuffart | |
171 | 935a568c | Florent Chuffart | This step is performed by the followinw part of the `wf.py` script: |
172 | 935a568c | Florent Chuffart | |
173 | 8e9facd8 | Florent Chuffart | .. literalinclude:: ../../../snep/src/current/wf.py |
174 | 935a568c | Florent Chuffart | :start-after: # _STARTOF_ step_3 |
175 | 935a568c | Florent Chuffart | :end-before: # _ENDOF_ step_3 |
176 | 935a568c | Florent Chuffart | :language: python |
177 | 935a568c | Florent Chuffart | |
178 | 935a568c | Florent Chuffart | The following table sum up number of reads, involved file sizes and process durations concerning |
179 | 935a568c | Florent Chuffart | the two last steps. In our case, aligment process have been multuthreaded over over 3 cores. |
180 | 935a568c | Florent Chuffart | |
181 | 935a568c | Florent Chuffart | == ============== ========================= ====== ================ ================== ================ |
182 | 935a568c | Florent Chuffart | id Illumina reads aligned and filtred reads ratio `.bed` file size TF input file size process duration |
183 | 935a568c | Florent Chuffart | == ============== ========================= ====== ================ ================== ================ |
184 | 935a568c | Florent Chuffart | 1 16436138 10199695 62,06% 1064 Mo 60 Mo 383 s. |
185 | 935a568c | Florent Chuffart | 2 16911132 12512727 73,99% 1298 Mo 64 Mo 437 s. |
186 | 935a568c | Florent Chuffart | 3 15946902 12340426 77,38% 1280 Mo 65 Mo 423 s. |
187 | 935a568c | Florent Chuffart | 4 13765584 10381903 75,42% 931 Mo 59 Mo 352 s. |
188 | 935a568c | Florent Chuffart | 5 15168268 11502855 75,83% 1031 Mo 64 Mo 386 s. |
189 | 935a568c | Florent Chuffart | 6 18850820 14024905 74,40% 1254 Mo 69 Mo 482 s. |
190 | 935a568c | Florent Chuffart | 7 15591124 12126623 77,78% 1163 Mo 72 Mo 405 s. |
191 | 935a568c | Florent Chuffart | 8 15659905 12475664 79,67% 1194 Mo 71 Mo 416 s. |
192 | 935a568c | Florent Chuffart | 9 14668641 10960565 74,72% 1052 Mo 70 Mo 375 s. |
193 | 935a568c | Florent Chuffart | 10 14339179 10454451 72,91% 1049 Mo 51 Mo 363 s. |
194 | 935a568c | Florent Chuffart | 11 18019895 13688774 75,96% 1378 Mo 59 Mo 474 s. |
195 | 935a568c | Florent Chuffart | 12 13746796 10810022 78,64% 1084 Mo 54 Mo 360 s. |
196 | 935a568c | Florent Chuffart | 13 15205065 11766016 77,38% 990 Mo 54 Mo 381 s. |
197 | 935a568c | Florent Chuffart | 14 17803097 13838883 77,73% 1154 Mo 60 Mo 452 s. |
198 | 935a568c | Florent Chuffart | 15 15434564 12307878 79,74% 1032 Mo 57 Mo 394 s. |
199 | 935a568c | Florent Chuffart | 16 16802587 12725665 75,74% 1221 Mo 48 Mo 438 s. |
200 | 935a568c | Florent Chuffart | 17 16058417 12513734 77,93% 1192 Mo 63 Mo 422 s. |
201 | 935a568c | Florent Chuffart | 18 16154482 13204331 81,74% 1277 Mo 52 Mo 430 s. |
202 | 935a568c | Florent Chuffart | 19 21013924 17102120 81,38% 1646 Mo 59 Mo 555 s. |
203 | 935a568c | Florent Chuffart | 20 17213114 14433357 83,85% 1389 Mo 53 Mo 459 s. |
204 | 935a568c | Florent Chuffart | 21 17360907 14733001 84,86% 1203 Mo 55 Mo 450 s. |
205 | 935a568c | Florent Chuffart | 22 18136816 15389581 84,85% 1257 Mo 53 Mo 469 s. |
206 | 935a568c | Florent Chuffart | 23 14763678 12173025 82,45% 1140 Mo 56 Mo 393 s. |
207 | 935a568c | Florent Chuffart | 24 15541709 12890345 82,94% 1057 Mo 48 Mo 398 s. |
208 | 935a568c | Florent Chuffart | 25 16433215 13094314 79,68% 1241 Mo 57 Mo 433 s. |
209 | 935a568c | Florent Chuffart | 26 17370850 14264136 82,12% 1347 Mo 51 Mo 466 s. |
210 | 935a568c | Florent Chuffart | 27 14613512 8654495 59,22% 887 Mo 56 Mo 339 s. |
211 | 935a568c | Florent Chuffart | 28 15248545 11367589 74,55% 1166 Mo 67 Mo 405 s. |
212 | 935a568c | Florent Chuffart | 29 14316809 10767926 75,21% 1103 Mo 63 Mo 379 s. |
213 | 935a568c | Florent Chuffart | 30 15178058 12265794 80,81% 1030 Mo 66 Mo 390 s. |
214 | 935a568c | Florent Chuffart | 31 14968579 11876186 79,34% 1009 Mo 63 Mo 387 s. |
215 | 935a568c | Florent Chuffart | 32 16912705 13550508 80,12% 1143 Mo 70 Mo 442 s. |
216 | 935a568c | Florent Chuffart | 33 16782154 12755111 76,00% 1227 Mo 65 Mo 438 s. |
217 | 935a568c | Florent Chuffart | 34 16741443 13168071 78,66% 1260 Mo 71 Mo 442 s. |
218 | 935a568c | Florent Chuffart | 35 13096171 10367041 79,16% 992 Mo 62 Mo 350 s. |
219 | 935a568c | Florent Chuffart | 36 17715118 14092985 79,55% 1404 Mo 68 Mo 483 s. |
220 | 935a568c | Florent Chuffart | 37 17288466 7402082 42,82% 741 Mo 48 Mo 339 s. |
221 | 935a568c | Florent Chuffart | 38 16116394 13178457 81,77% 1101 Mo 63 Mo 420 s. |
222 | 935a568c | Florent Chuffart | 39 14241106 10537228 73,99% 880 Mo 57 Mo 348 s. |
223 | 935a568c | Florent Chuffart | 40 13784738 10598464 76,89% 1005 Mo 64 Mo 358 s. |
224 | 935a568c | Florent Chuffart | 41 12438007 9620975 77,35% 911 Mo 60 Mo 326 s. |
225 | 935a568c | Florent Chuffart | 42 13853959 11031238 79,63% 1045 Mo 64 Mo 365 s. |
226 | 935a568c | Florent Chuffart | 43 12036162 6654780 55,29% 684 Mo 46 Mo 268 s. |
227 | 935a568c | Florent Chuffart | 44 13873129 10251074 73,89% 1048 Mo 61 Mo 365 s. |
228 | 935a568c | Florent Chuffart | 45 19817751 14904502 75,21% 1520 Mo 72 Mo 528 s. |
229 | 935a568c | Florent Chuffart | 46 13368959 10818619 80,92% 912 Mo 63 Mo 350 s. |
230 | 935a568c | Florent Chuffart | 47 7566467 6139001 81,13% 520 Mo 44 Mo 201 s. |
231 | 935a568c | Florent Chuffart | 48 32586928 21191363 65,03% 1816 Mo 82 Mo 766 s. |
232 | 935a568c | Florent Chuffart | 49 30733184 18791373 61,14% 1801 Mo 89 Mo 721 s. |
233 | 935a568c | Florent Chuffart | 50 41287616 30383875 73,59% 2911 Mo 112 Mo 1065 s. |
234 | 935a568c | Florent Chuffart | 51 40439965 31177914 77,10% 2981 Mo 117 Mo 1070 s. |
235 | 935a568c | Florent Chuffart | 53 40876476 33780065 82,64% 3316 Mo 103 Mo 1165 s. |
236 | 935a568c | Florent Chuffart | 55 52424414 47117107 89,88% 3811 Mo 119 Mo 1477 s. |
237 | 935a568c | Florent Chuffart | == ============== ========================= ====== ================ ================== ================ |
238 | 935a568c | Florent Chuffart | |
239 | 935a568c | Florent Chuffart | For some reasons (manipulation efficency, e.g. PCR...), we remove samples 33, 45, 48 and 55. |
240 | 935a568c | Florent Chuffart | |
241 | 935a568c | Florent Chuffart | |
242 | 935a568c | Florent Chuffart | Run TemplateFilter on Mnase Samples |
243 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
244 | 935a568c | Florent Chuffart | |
245 | 935a568c | Florent Chuffart | Finally, for each sample we perfome TemplateFilter analysis. |
246 | 935a568c | Florent Chuffart | |
247 | 935a568c | Florent Chuffart | **WARNING** TemplateFilter returns a list of nucleosomes. Each nucleosome is |
248 | 935a568c | Florent Chuffart | define by its center and its width. An odd width leads us to considere non |
249 | 935a568c | Florent Chuffart | interger lower and upper bound. |
250 | 935a568c | Florent Chuffart | |
251 | 935a568c | Florent Chuffart | **WARNING** TemplateFilter is not design to deal with replicate. So we choose to |
252 | 935a568c | Florent Chuffart | keep a maximum of nucleosome and filter in a second time using the benefit of |
253 | 935a568c | Florent Chuffart | replicate. To do that we set a low correlation threshold parameter (`0.5`) and a |
254 | 935a568c | Florent Chuffart | particularly high value of overlaping (`300%`). |
255 | 935a568c | Florent Chuffart | |
256 | 935a568c | Florent Chuffart | This step is performed by the followinw part of the `wf.py` script: |
257 | 935a568c | Florent Chuffart | |
258 | 8e9facd8 | Florent Chuffart | .. literalinclude:: ../../../snep/src/current/wf.py |
259 | 935a568c | Florent Chuffart | :start-after: # _STARTOF_ step_4 |
260 | 935a568c | Florent Chuffart | :end-before: # _ENDOF_ step_4 |
261 | 935a568c | Florent Chuffart | :language: python |
262 | 935a568c | Florent Chuffart | |
263 | 935a568c | Florent Chuffart | == ====== ========== ============= ================ |
264 | 935a568c | Florent Chuffart | id strain found nucs nuc file size process duration |
265 | 935a568c | Florent Chuffart | == ====== ========== ============= ================ |
266 | 935a568c | Florent Chuffart | 1 BY 96214 68 Mo 1022 s. |
267 | 935a568c | Florent Chuffart | 2 BY 91694 65 Mo 1038 s. |
268 | 935a568c | Florent Chuffart | 3 BY 91205 65 Mo 1036 s. |
269 | 935a568c | Florent Chuffart | 4 RM 88076 62 Mo 984 s. |
270 | 935a568c | Florent Chuffart | 5 RM 90141 64 Mo 967 s. |
271 | 935a568c | Florent Chuffart | 6 RM 87517 62 Mo 980 s. |
272 | 935a568c | Florent Chuffart | 7 YJM 88945 64 Mo 566 s. |
273 | 935a568c | Florent Chuffart | 8 YJM 88689 64 Mo 570 s. |
274 | 935a568c | Florent Chuffart | 9 YJM 88128 63 Mo 565 s. |
275 | 935a568c | Florent Chuffart | == ====== ========== ============= ================ |
276 | 935a568c | Florent Chuffart | |
277 | 935a568c | Florent Chuffart | |
278 | 935a568c | Florent Chuffart | |
279 | 935a568c | Florent Chuffart | |
280 | 935a568c | Florent Chuffart | |
281 | 935a568c | Florent Chuffart | |
282 | 935a568c | Florent Chuffart | |
283 | 935a568c | Florent Chuffart | |
284 | 935a568c | Florent Chuffart | |
285 | 935a568c | Florent Chuffart | |
286 | 935a568c | Florent Chuffart | |
287 | 935a568c | Florent Chuffart | |
288 | 935a568c | Florent Chuffart | |
289 | 935a568c | Florent Chuffart | Inferring Nucleosome Position and Extracting Read Counts |
290 | 935a568c | Florent Chuffart | -------------------------------------------------------- |
291 | 935a568c | Florent Chuffart | |
292 | 935a568c | Florent Chuffart | |
293 | 935a568c | Florent Chuffart | |
294 | 935a568c | Florent Chuffart | This preprocessing step consists in the 4 main steps embed in the `wf.py` and |
295 | 935a568c | Florent Chuffart | described bellow. As a preamble, this script computes `samples` `samples_mnase` |
296 | 935a568c | Florent Chuffart | and `strains` that will be used along the 4 steps. |
297 | 935a568c | Florent Chuffart | |
298 | 935a568c | Florent Chuffart | |
299 | 935a568c | Florent Chuffart | The second part of the tutoriel use `R` (http://http://www.r-project.org). It |
300 | b20637ed | Florent Chuffart | consists in the following main steps: |
301 | 935a568c | Florent Chuffart | |
302 | 935a568c | Florent Chuffart | - compute_rois.R |
303 | 935a568c | Florent Chuffart | - extract_maps.R |
304 | b20637ed | Florent Chuffart | - compare_common_wp.R |
305 | b20637ed | Florent Chuffart | - split_samples.R |
306 | 935a568c | Florent Chuffart | - count_reads.R |
307 | 935a568c | Florent Chuffart | - get_size_factors |
308 | 935a568c | Florent Chuffart | - launch_deseq.R |
309 | 935a568c | Florent Chuffart | |
310 | 935a568c | Florent Chuffart | Computing Common Genome Region Between Strains |
311 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
312 | 935a568c | Florent Chuffart | |
313 | 935a568c | Florent Chuffart | .. code:: bash |
314 | 935a568c | Florent Chuffart | |
315 | 8e9facd8 | Florent Chuffart | R CMD BATCH src/current/compute_rois.R |
316 | 935a568c | Florent Chuffart | |
317 | 935a568c | Florent Chuffart | |
318 | 935a568c | Florent Chuffart | Extracting Maps for Well Positionned and Fuzzy Nucleosomes |
319 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
320 | 935a568c | Florent Chuffart | |
321 | 935a568c | Florent Chuffart | .. code:: bash |
322 | 935a568c | Florent Chuffart | |
323 | 8e9facd8 | Florent Chuffart | R CMD BATCH src/current/extract_maps.R |
324 | 935a568c | Florent Chuffart | |
325 | 935a568c | Florent Chuffart | |
326 | b20637ed | Florent Chuffart | Compute Distance Between Well Positionned Nucleosomes |
327 | b20637ed | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
328 | b20637ed | Florent Chuffart | |
329 | b20637ed | Florent Chuffart | .. code:: bash |
330 | b20637ed | Florent Chuffart | |
331 | b20637ed | Florent Chuffart | R CMD BATCH src/current/compare_common_wp.R |
332 | b20637ed | Florent Chuffart | |
333 | b20637ed | Florent Chuffart | |
334 | b20637ed | Florent Chuffart | Split and Compress Samples According CURs |
335 | b20637ed | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
336 | b20637ed | Florent Chuffart | |
337 | b20637ed | Florent Chuffart | .. code:: bash |
338 | b20637ed | Florent Chuffart | |
339 | b20637ed | Florent Chuffart | R CMD BATCH src/current/split_samples.R |
340 | b20637ed | Florent Chuffart | |
341 | b20637ed | Florent Chuffart | |
342 | 935a568c | Florent Chuffart | Count Reads for Each Nucleosome |
343 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
344 | 935a568c | Florent Chuffart | |
345 | 935a568c | Florent Chuffart | .. code:: bash |
346 | 935a568c | Florent Chuffart | |
347 | 8e9facd8 | Florent Chuffart | R CMD BATCH src/current/count_reads.R |
348 | 935a568c | Florent Chuffart | |
349 | 935a568c | Florent Chuffart | |
350 | 935a568c | Florent Chuffart | Get Size Factors Using DESeq |
351 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
352 | 935a568c | Florent Chuffart | |
353 | 935a568c | Florent Chuffart | .. code:: bash |
354 | 935a568c | Florent Chuffart | |
355 | 8e9facd8 | Florent Chuffart | R CMD BATCH src/current/get_size_factors.R |
356 | 935a568c | Florent Chuffart | |
357 | 935a568c | Florent Chuffart | |
358 | 935a568c | Florent Chuffart | Performing DESeq Analysis |
359 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
360 | 935a568c | Florent Chuffart | |
361 | 935a568c | Florent Chuffart | .. code:: bash |
362 | 935a568c | Florent Chuffart | |
363 | 8e9facd8 | Florent Chuffart | R CMD BATCH src/current/launch_deseq.R |
364 | 935a568c | Florent Chuffart | |
365 | 935a568c | Florent Chuffart | |
366 | 935a568c | Florent Chuffart | Results |
367 | 935a568c | Florent Chuffart | ------- |
368 | 935a568c | Florent Chuffart | |
369 | 935a568c | Florent Chuffart | Output Files Organisation |
370 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
371 | 935a568c | Florent Chuffart | Previous steps produce following 45 files. |
372 | 935a568c | Florent Chuffart | Each filename is under the form |
373 | 935a568c | Florent Chuffart | |
374 | 935a568c | Florent Chuffart | .. code:: bash |
375 | 935a568c | Florent Chuffart | |
376 | 8e9facd8 | Florent Chuffart | results/current/[combi]_[marker]_[form]_snep.tab |
377 | 935a568c | Florent Chuffart | |
378 | 935a568c | Florent Chuffart | Where combi is in {BY_RM, BY_YJM, RM_YJM} for each strain combination, marker is |
379 | 935a568c | Florent Chuffart | in {H3K4me1, H3K4me3, H3K9ac, H3K14ac, H4K12ac} for each post translational |
380 | 935a568c | Florent Chuffart | histone modification and form is in {wp, fuzzy, wpfuzzy} considering well |
381 | 935a568c | Florent Chuffart | positionned nucleosomes, fuzzy nucleosomes or both for SNEP computation. |
382 | 935a568c | Florent Chuffart | |
383 | 935a568c | Florent Chuffart | |
384 | 935a568c | Florent Chuffart | |
385 | 935a568c | Florent Chuffart | chr_BY lower_bound_BY upper_bound_BY index_nuc_BY chr_RM lower_bound_RM upper_bound_RM index_nuc_RM roi_index form BY_Mnase_Seq_1 BY_Mnase_Seq_2 BY_Mnase_Seq_3 RM_Mnase_Seq_4 RM_Mnase_Seq_5 RM_Mnase_Seq_6 BY_H3K14ac_36 |
386 | 935a568c | Florent Chuffart | BY_H3K14ac_37 BY_H3K14ac_53 RM_H3K14ac_38 RM_H3K14ac_39 pvalsGLM |
387 | 935a568c | Florent Chuffart | |
388 | 935a568c | Florent Chuffart | For each file, there is 1 line per nucleosome and each line is composed of many columns divided into 3 main topics: |
389 | 935a568c | Florent Chuffart | - nuc information |
390 | 935a568c | Florent Chuffart | - number opf reads for each sample |
391 | 935a568c | Florent Chuffart | - DESeq analysis results. |
392 | 935a568c | Florent Chuffart | |
393 | 935a568c | Florent Chuffart | For exemple for the file *BY_RM_H3K14ac_wp_snep.tab* informations are: |
394 | 935a568c | Florent Chuffart | - chr_BY, the BY chr involved |
395 | 935a568c | Florent Chuffart | - lower_bound_BY, the lower bound of the BY nuc |
396 | 935a568c | Florent Chuffart | - upper_bound_BY, the upper_bound of the BY nuc |
397 | 935a568c | Florent Chuffart | - index_nuc_BY, the index of the nuc in the entire list of BY nucs |
398 | 935a568c | Florent Chuffart | - chr_RM, lower_bound_RM, upper_bound_RM, index_nuc_RM |
399 | 935a568c | Florent Chuffart | are the same information for the RM strain |
400 | 935a568c | Florent Chuffart | - roi_index, the index of the region of interrest involved. |
401 | 935a568c | Florent Chuffart | |
402 | 935a568c | Florent Chuffart | Next cols concern indicators for each sample. They are labeled [strain]_[marker]_[sample_id] and each value represents the number of reads for the current nuc for the sample *sample_id*. |
403 | 935a568c | Florent Chuffart | |
404 | 935a568c | Florent Chuffart | The 5 final columns concern DESeq analysis: |
405 | 935a568c | Florent Chuffart | - manip[a_manip] strain[a_strain] manip[a_strain]:strain[a_strain], the manip (marker) effect, the strain effect and the snep effect. |
406 | 935a568c | Florent Chuffart | - pvalsGLM, the pvalue resulting of the comparison of the GLM model considering or the interaction term *marker:strain* |
407 | 935a568c | Florent Chuffart | - snep_index, a boolean set to TRUE if the *pvalueGLM* value is under the threshold computed with FDR function with a rate set to 0.01%. |
408 | 935a568c | Florent Chuffart | |
409 | 935a568c | Florent Chuffart | It also produces the file that explicts size factor for each involved sample in differents strain combination and nucleosomal region type: |
410 | 935a568c | Florent Chuffart | |
411 | 8e9facd8 | Florent Chuffart | TODO: include this file... /home/filleton/analyses/snepcatalog/data/2013-10-09/current/README.txt |
412 | 935a568c | Florent Chuffart | |
413 | 935a568c | Florent Chuffart | |
414 | 935a568c | Florent Chuffart | .. code:: bash |
415 | 935a568c | Florent Chuffart | |
416 | 8e9facd8 | Florent Chuffart | results/current/size_factors.tab |
417 | 935a568c | Florent Chuffart | |
418 | 935a568c | Florent Chuffart | |
419 | 935a568c | Florent Chuffart | |
420 | 935a568c | Florent Chuffart | |
421 | 935a568c | Florent Chuffart | Number of SNEPs |
422 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^ |
423 | 935a568c | Florent Chuffart | |
424 | 935a568c | Florent Chuffart | Here are the number of computed for each forms. |
425 | 935a568c | Florent Chuffart | |
426 | 935a568c | Florent Chuffart | .. code:: bash |
427 | 935a568c | Florent Chuffart | |
428 | 935a568c | Florent Chuffart | [1] "wp" |
429 | 935a568c | Florent Chuffart | #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
430 | 935a568c | Florent Chuffart | BY-RM 30234 520 798 83 3566 26 |
431 | 935a568c | Florent Chuffart | BY-YJM 31298 303 619 102 103 128 |
432 | 935a568c | Florent Chuffart | RM-YJM 29863 129 340 46 3177 18 |
433 | 935a568c | Florent Chuffart | [1] "fuzzy" |
434 | 935a568c | Florent Chuffart | #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
435 | 935a568c | Florent Chuffart | BY-RM 10748 294 308 101 1681 42 |
436 | 935a568c | Florent Chuffart | BY-YJM 10669 122 176 124 93 87 |
437 | 935a568c | Florent Chuffart | RM-YJM 11478 54 112 41 1389 20 |
438 | 935a568c | Florent Chuffart | [1] "wpfuzzy" |
439 | 935a568c | Florent Chuffart | #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
440 | 935a568c | Florent Chuffart | BY-RM 40982 770 1136 183 5404 73 |
441 | 935a568c | Florent Chuffart | BY-YJM 41967 439 804 214 198 199 |
442 | 935a568c | Florent Chuffart | RM-YJM 41341 184 468 87 4687 37 |
443 | 935a568c | Florent Chuffart | |
444 | 935a568c | Florent Chuffart | |
445 | 935a568c | Florent Chuffart | TODO: |
446 | 935a568c | Florent Chuffart | - Print/study intra/inter strain LODs. |
447 | 935a568c | Florent Chuffart | - Check the normality of sample using Shapiro–Wilk (Hypothesis for computing LODs) |
448 | 935a568c | Florent Chuffart |