Statistiques
| Branche: | Révision :

root / doc / sphinx_doc / build / text / tuto.txt @ 21b8928f

Historique | Voir | Annoter | Télécharger (25,18 ko)

1 935a568c Florent Chuffart
2 935a568c Florent Chuffart
Tutorial
3 935a568c Florent Chuffart
********
4 935a568c Florent Chuffart
5 935a568c Florent Chuffart
This tutorial describes steps allowing to perform quantitave analysis
6 935a568c Florent Chuffart
of nucleosomal epigenome. We assume that files are organised around a
7 935a568c Florent Chuffart
given hierarchie and that all command lines are launched from
8 935a568c Florent Chuffart
project's root.
9 935a568c Florent Chuffart
10 935a568c Florent Chuffart
This tutorial is divided into t=wo main parts. First one consists in
11 935a568c Florent Chuffart
the python script *wf.py* that aligns and convert Illumina reads.
12 935a568c Florent Chuffart
Second one is the R script *main.r* that extracts information
13 935a568c Florent Chuffart
(nucleosome position and indicators) from the dataset.
14 935a568c Florent Chuffart
15 935a568c Florent Chuffart
16 935a568c Florent Chuffart
Dataset and Configuration File
17 935a568c Florent Chuffart
==============================
18 935a568c Florent Chuffart
19 935a568c Florent Chuffart
We want to compare nucleosomes of 3 yeast strains:
20 935a568c Florent Chuffart
21 935a568c Florent Chuffart
* BY
22 935a568c Florent Chuffart
23 935a568c Florent Chuffart
* RM
24 935a568c Florent Chuffart
25 935a568c Florent Chuffart
* YJM
26 935a568c Florent Chuffart
27 935a568c Florent Chuffart
For each strain we perform Mnase-Seq and ChIP-Seq using the 5
28 935a568c Florent Chuffart
following markers:
29 935a568c Florent Chuffart
30 935a568c Florent Chuffart
* H3K4me1
31 935a568c Florent Chuffart
32 935a568c Florent Chuffart
* H3K4me3
33 935a568c Florent Chuffart
34 935a568c Florent Chuffart
* H3K9ac
35 935a568c Florent Chuffart
36 935a568c Florent Chuffart
* H3K14ac
37 935a568c Florent Chuffart
38 935a568c Florent Chuffart
* H4K12ac
39 935a568c Florent Chuffart
40 935a568c Florent Chuffart
In order to simplify the design of exeriment, we considere Mnase as a
41 935a568c Florent Chuffart
marker. For each couple *(strain, marker)* we perform 3 replicates.
42 935a568c Florent Chuffart
So, theoritically we should have *3 * (1 + 5) * 3 = 54* samples. In
43 935a568c Florent Chuffart
practice we only obtain 2 replicates for *(YJM, H3K4me1)*. Each one of
44 935a568c Florent Chuffart
the 53 samples is indentify by a uniq identifier. The file
45 935a568c Florent Chuffart
*CSV_SAMPLE_FILE* sums up this information.
46 935a568c Florent Chuffart
47 935a568c Florent Chuffart
We use a convention to link sample and Illumina fastq outputs.
48 935a568c Florent Chuffart
Illumina output files of the sample *ID* will be stored in the
49 935a568c Florent Chuffart
directory *ILLUMINA_OUTPUTFILE_PREFIX* + *ID*. For example, sample 41
50 935a568c Florent Chuffart
outputs will be stored in the directory
51 935a568c Florent Chuffart
*data/2012-09-05/FASTQ/Sample_Yvert_Bq41/*.
52 935a568c Florent Chuffart
53 935a568c Florent Chuffart
For BY (resp. RM and YJM) we use following reference genome
54 935a568c Florent Chuffart
*saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta* (resp.
55 935a568c Florent Chuffart
*saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta* and
56 935a568c Florent Chuffart
*saccharomyces_cerevisiae_YJM_789_screencontig.fasta*). The index
57 935a568c Florent Chuffart
*FASTA_REFERENCE_GENOME_FILES* stores this information.
58 935a568c Florent Chuffart
59 935a568c Florent Chuffart
Each chromosome/contig is identify in the fasta file by an obscure
60 935a568c Florent Chuffart
identifier. For example, BY chromosome I is identify by
61 935a568c Florent Chuffart
*gi|144228165|ref|NC_001133.7|* when TemplateFilter is waiting for an
62 935a568c Florent Chuffart
integer. So, we translate it. The index *FASTA_INDEXES* stores this
63 935a568c Florent Chuffart
translation.
64 935a568c Florent Chuffart
65 935a568c Florent Chuffart
From a pragamatical point of view we discard some part of the genome
66 935a568c Florent Chuffart
(repeated sequence etc...). The list of the black listed area is
67 935a568c Florent Chuffart
explicitely detailled in *AREA_BLACK_LIST*.
68 935a568c Florent Chuffart
69 935a568c Florent Chuffart
For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use
70 935a568c Florent Chuffart
previously compute .c2c file
71 935a568c Florent Chuffart
*data/2012-03_primarydata/BY_RM_gxcomp.c2c* (resp.
72 935a568c Florent Chuffart
*BY_YJM_GComp_All.c2c* and *RM_YJM_gxcomp.c2c*). For more information
73 935a568c Florent Chuffart
about .c2c files, please read section 5 of the manual of
74 935a568c Florent Chuffart
*NucleoMiner*, the old version of *NucleoMiner2* (http://www.ens-
75 935a568c Florent Chuffart
lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf).
76 935a568c Florent Chuffart
77 935a568c Florent Chuffart
*nucleominer* uses specific directory to work in, these are described
78 935a568c Florent Chuffart
in *INDEX_DIR*, *ALIGN_DIR* and *LOG_DIR*.
79 935a568c Florent Chuffart
80 935a568c Florent Chuffart
Finally, *nucleominer* use external ressources, the path to these
81 935a568c Florent Chuffart
resspources are describe in *BOWTIE_BUILD_BIN*, *BOWTIE2_BIN*,
82 935a568c Florent Chuffart
*SAMTOOLS_BIN*, *BEDTOOLS_BIN* and *TF_BIN* and *TF_TEMPLATES_FILE*.
83 935a568c Florent Chuffart
84 935a568c Florent Chuffart
All paths, prefixes and indexes could be change in the
85 8e9facd8 Florent Chuffart
*src/current/nucleominer_config.json* file.
86 935a568c Florent Chuffart
87 935a568c Florent Chuffart
88 935a568c Florent Chuffart
Preprocessing Illumina Fastq Reads for Each Sample
89 935a568c Florent Chuffart
==================================================
90 935a568c Florent Chuffart
91 935a568c Florent Chuffart
This preprocessing step consists in the 4 main steps embed in the
92 935a568c Florent Chuffart
*wf.py* and described bellow. As a preamble, this script computes
93 935a568c Florent Chuffart
*samples* *samples_mnase* and *strains* that will be used along the 4
94 935a568c Florent Chuffart
steps.
95 935a568c Florent Chuffart
96 935a568c Florent Chuffart
97 935a568c Florent Chuffart
Creating Bowtie Index from each Reference Genome
98 935a568c Florent Chuffart
------------------------------------------------
99 935a568c Florent Chuffart
100 935a568c Florent Chuffart
For each strain, we need to create bowtie index. Bowtie index of a
101 935a568c Florent Chuffart
strain is a tree view of the genemoe reference for this strain. It
102 935a568c Florent Chuffart
will be used by bowtie to align reads. This step is performed by the
103 935a568c Florent Chuffart
following part of the *wf.py* script:
104 935a568c Florent Chuffart
105 935a568c Florent Chuffart
The following table sum up involved file sizes and process durations
106 935a568c Florent Chuffart
concerning this step.
107 935a568c Florent Chuffart
108 935a568c Florent Chuffart
+--------+------------------------+------------------------+------------------+
109 935a568c Florent Chuffart
| strain | fasta genome file size | bowtie index file size | process duration |
110 935a568c Florent Chuffart
+========+========================+========================+==================+
111 935a568c Florent Chuffart
| BY     | 12 Mo                  | 25 Mo                  | 11 s.            |
112 935a568c Florent Chuffart
+--------+------------------------+------------------------+------------------+
113 935a568c Florent Chuffart
| RM     | 12 Mo                  | 24 Mo                  | 9 s.             |
114 935a568c Florent Chuffart
+--------+------------------------+------------------------+------------------+
115 935a568c Florent Chuffart
| YJM    | 12 Mo                  | 25 Mo                  | 11 s.            |
116 935a568c Florent Chuffart
+--------+------------------------+------------------------+------------------+
117 935a568c Florent Chuffart
118 935a568c Florent Chuffart
119 935a568c Florent Chuffart
Aligning Reads to Reference Genome
120 935a568c Florent Chuffart
----------------------------------
121 935a568c Florent Chuffart
122 935a568c Florent Chuffart
Next, we launch bowtie to align reads to the reference genome. It
123 935a568c Florent Chuffart
produces a *.sam* file that we convert into a *.bed* file. Binaries
124 935a568c Florent Chuffart
for *bowtie*, *samtools* and *bedtools* are wrapped using python
125 935a568c Florent Chuffart
*subprocess* class. This step is performed by the followinw part of
126 935a568c Florent Chuffart
the *wf.py* script:
127 935a568c Florent Chuffart
128 935a568c Florent Chuffart
129 935a568c Florent Chuffart
Convert Aligned Reads for TemplateFilter
130 935a568c Florent Chuffart
----------------------------------------
131 935a568c Florent Chuffart
132 935a568c Florent Chuffart
TemplateFilter use particular input format for reads, so we convert
133 935a568c Florent Chuffart
*.bed* file. TemplateFilter expect reads as following: *chr coord
134 935a568c Florent Chuffart
strand #read* where:
135 935a568c Florent Chuffart
136 935a568c Florent Chuffart
* chr is the number of the chromosome;
137 935a568c Florent Chuffart
138 935a568c Florent Chuffart
* coord is the coordinate of the reads;
139 935a568c Florent Chuffart
140 935a568c Florent Chuffart
* strand is *F* for forward and *R* for reverse;
141 935a568c Florent Chuffart
142 935a568c Florent Chuffart
* #reads the number of reads for this position.
143 935a568c Florent Chuffart
144 935a568c Florent Chuffart
Each entry is *tab*-separated.
145 935a568c Florent Chuffart
146 935a568c Florent Chuffart
**WARNING** for reverse strand bowtie returns the position of left
147 935a568c Florent Chuffart
first nucleotid when TemplateFilter is waiting for right one. So this
148 935a568c Florent Chuffart
step takes it into account and add lenght of reads (in our case 50) to
149 935a568c Florent Chuffart
reverse reads coordinate.
150 935a568c Florent Chuffart
151 935a568c Florent Chuffart
This step is performed by the followinw part of the *wf.py* script:
152 935a568c Florent Chuffart
153 935a568c Florent Chuffart
The following table sum up number of reads, involved file sizes and
154 935a568c Florent Chuffart
process durations concerning the two last steps. In our case, aligment
155 935a568c Florent Chuffart
process have been multuthreaded over over 3 cores.
156 935a568c Florent Chuffart
157 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
158 935a568c Florent Chuffart
| id | Illumina reads | aligned and filtred reads | ratio  | *.bed* file size | TF input file size | process duration |
159 935a568c Florent Chuffart
+====+================+===========================+========+==================+====================+==================+
160 935a568c Florent Chuffart
| 1  | 16436138       | 10199695                  | 62,06% | 1064 Mo          | 60  Mo             | 383   s.         |
161 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
162 935a568c Florent Chuffart
| 2  | 16911132       | 12512727                  | 73,99% | 1298 Mo          | 64  Mo             | 437   s.         |
163 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
164 935a568c Florent Chuffart
| 3  | 15946902       | 12340426                  | 77,38% | 1280 Mo          | 65  Mo             | 423   s.         |
165 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
166 935a568c Florent Chuffart
| 4  | 13765584       | 10381903                  | 75,42% | 931  Mo          | 59  Mo             | 352   s.         |
167 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
168 935a568c Florent Chuffart
| 5  | 15168268       | 11502855                  | 75,83% | 1031 Mo          | 64  Mo             | 386   s.         |
169 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
170 935a568c Florent Chuffart
| 6  | 18850820       | 14024905                  | 74,40% | 1254 Mo          | 69  Mo             | 482   s.         |
171 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
172 935a568c Florent Chuffart
| 7  | 15591124       | 12126623                  | 77,78% | 1163 Mo          | 72  Mo             | 405   s.         |
173 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
174 935a568c Florent Chuffart
| 8  | 15659905       | 12475664                  | 79,67% | 1194 Mo          | 71  Mo             | 416   s.         |
175 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
176 935a568c Florent Chuffart
| 9  | 14668641       | 10960565                  | 74,72% | 1052 Mo          | 70  Mo             | 375   s.         |
177 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
178 935a568c Florent Chuffart
| 10 | 14339179       | 10454451                  | 72,91% | 1049 Mo          | 51  Mo             | 363   s.         |
179 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
180 935a568c Florent Chuffart
| 11 | 18019895       | 13688774                  | 75,96% | 1378 Mo          | 59  Mo             | 474   s.         |
181 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
182 935a568c Florent Chuffart
| 12 | 13746796       | 10810022                  | 78,64% | 1084 Mo          | 54  Mo             | 360   s.         |
183 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
184 935a568c Florent Chuffart
| 13 | 15205065       | 11766016                  | 77,38% | 990  Mo          | 54  Mo             | 381   s.         |
185 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
186 935a568c Florent Chuffart
| 14 | 17803097       | 13838883                  | 77,73% | 1154 Mo          | 60  Mo             | 452   s.         |
187 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
188 935a568c Florent Chuffart
| 15 | 15434564       | 12307878                  | 79,74% | 1032 Mo          | 57  Mo             | 394   s.         |
189 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
190 935a568c Florent Chuffart
| 16 | 16802587       | 12725665                  | 75,74% | 1221 Mo          | 48  Mo             | 438   s.         |
191 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
192 935a568c Florent Chuffart
| 17 | 16058417       | 12513734                  | 77,93% | 1192 Mo          | 63  Mo             | 422   s.         |
193 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
194 935a568c Florent Chuffart
| 18 | 16154482       | 13204331                  | 81,74% | 1277 Mo          | 52  Mo             | 430   s.         |
195 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
196 935a568c Florent Chuffart
| 19 | 21013924       | 17102120                  | 81,38% | 1646 Mo          | 59  Mo             | 555   s.         |
197 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
198 935a568c Florent Chuffart
| 20 | 17213114       | 14433357                  | 83,85% | 1389 Mo          | 53  Mo             | 459   s.         |
199 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
200 935a568c Florent Chuffart
| 21 | 17360907       | 14733001                  | 84,86% | 1203 Mo          | 55  Mo             | 450   s.         |
201 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
202 935a568c Florent Chuffart
| 22 | 18136816       | 15389581                  | 84,85% | 1257 Mo          | 53  Mo             | 469   s.         |
203 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
204 935a568c Florent Chuffart
| 23 | 14763678       | 12173025                  | 82,45% | 1140 Mo          | 56  Mo             | 393   s.         |
205 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
206 935a568c Florent Chuffart
| 24 | 15541709       | 12890345                  | 82,94% | 1057 Mo          | 48  Mo             | 398   s.         |
207 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
208 935a568c Florent Chuffart
| 25 | 16433215       | 13094314                  | 79,68% | 1241 Mo          | 57  Mo             | 433   s.         |
209 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
210 935a568c Florent Chuffart
| 26 | 17370850       | 14264136                  | 82,12% | 1347 Mo          | 51  Mo             | 466   s.         |
211 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
212 935a568c Florent Chuffart
| 27 | 14613512       | 8654495                   | 59,22% | 887  Mo          | 56  Mo             | 339   s.         |
213 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
214 935a568c Florent Chuffart
| 28 | 15248545       | 11367589                  | 74,55% | 1166 Mo          | 67  Mo             | 405   s.         |
215 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
216 935a568c Florent Chuffart
| 29 | 14316809       | 10767926                  | 75,21% | 1103 Mo          | 63  Mo             | 379   s.         |
217 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
218 935a568c Florent Chuffart
| 30 | 15178058       | 12265794                  | 80,81% | 1030 Mo          | 66  Mo             | 390   s.         |
219 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
220 935a568c Florent Chuffart
| 31 | 14968579       | 11876186                  | 79,34% | 1009 Mo          | 63  Mo             | 387   s.         |
221 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
222 935a568c Florent Chuffart
| 32 | 16912705       | 13550508                  | 80,12% | 1143 Mo          | 70  Mo             | 442   s.         |
223 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
224 935a568c Florent Chuffart
| 33 | 16782154       | 12755111                  | 76,00% | 1227 Mo          | 65  Mo             | 438   s.         |
225 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
226 935a568c Florent Chuffart
| 34 | 16741443       | 13168071                  | 78,66% | 1260 Mo          | 71  Mo             | 442   s.         |
227 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
228 935a568c Florent Chuffart
| 35 | 13096171       | 10367041                  | 79,16% | 992  Mo          | 62  Mo             | 350   s.         |
229 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
230 935a568c Florent Chuffart
| 36 | 17715118       | 14092985                  | 79,55% | 1404 Mo          | 68  Mo             | 483   s.         |
231 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
232 935a568c Florent Chuffart
| 37 | 17288466       | 7402082                   | 42,82% | 741  Mo          | 48  Mo             | 339   s.         |
233 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
234 935a568c Florent Chuffart
| 38 | 16116394       | 13178457                  | 81,77% | 1101 Mo          | 63  Mo             | 420   s.         |
235 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
236 935a568c Florent Chuffart
| 39 | 14241106       | 10537228                  | 73,99% | 880  Mo          | 57  Mo             | 348   s.         |
237 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
238 935a568c Florent Chuffart
| 40 | 13784738       | 10598464                  | 76,89% | 1005 Mo          | 64  Mo             | 358   s.         |
239 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
240 935a568c Florent Chuffart
| 41 | 12438007       | 9620975                   | 77,35% | 911  Mo          | 60  Mo             | 326   s.         |
241 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
242 935a568c Florent Chuffart
| 42 | 13853959       | 11031238                  | 79,63% | 1045 Mo          | 64  Mo             | 365   s.         |
243 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
244 935a568c Florent Chuffart
| 43 | 12036162       | 6654780                   | 55,29% | 684  Mo          | 46  Mo             | 268   s.         |
245 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
246 935a568c Florent Chuffart
| 44 | 13873129       | 10251074                  | 73,89% | 1048 Mo          | 61  Mo             | 365   s.         |
247 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
248 935a568c Florent Chuffart
| 45 | 19817751       | 14904502                  | 75,21% | 1520 Mo          | 72  Mo             | 528   s.         |
249 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
250 935a568c Florent Chuffart
| 46 | 13368959       | 10818619                  | 80,92% | 912  Mo          | 63  Mo             | 350   s.         |
251 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
252 935a568c Florent Chuffart
| 47 | 7566467        | 6139001                   | 81,13% | 520  Mo          | 44  Mo             | 201   s.         |
253 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
254 935a568c Florent Chuffart
| 48 | 32586928       | 21191363                  | 65,03% | 1816 Mo          | 82  Mo             | 766   s.         |
255 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
256 935a568c Florent Chuffart
| 49 | 30733184       | 18791373                  | 61,14% | 1801 Mo          | 89  Mo             | 721   s.         |
257 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
258 935a568c Florent Chuffart
| 50 | 41287616       | 30383875                  | 73,59% | 2911 Mo          | 112 Mo             | 1065  s.         |
259 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
260 935a568c Florent Chuffart
| 51 | 40439965       | 31177914                  | 77,10% | 2981 Mo          | 117 Mo             | 1070  s.         |
261 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
262 935a568c Florent Chuffart
| 53 | 40876476       | 33780065                  | 82,64% | 3316 Mo          | 103 Mo             | 1165  s.         |
263 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
264 935a568c Florent Chuffart
| 55 | 52424414       | 47117107                  | 89,88% | 3811 Mo          | 119 Mo             | 1477  s.         |
265 935a568c Florent Chuffart
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
266 935a568c Florent Chuffart
267 935a568c Florent Chuffart
For some reasons (manipulation efficency, e.g. PCR...), we remove
268 935a568c Florent Chuffart
samples 33, 45, 48 and 55.
269 935a568c Florent Chuffart
270 935a568c Florent Chuffart
271 935a568c Florent Chuffart
Run TemplateFilter on Mnase Samples
272 935a568c Florent Chuffart
-----------------------------------
273 935a568c Florent Chuffart
274 935a568c Florent Chuffart
Finally, for each sample we perfome TemplateFilter analysis.
275 935a568c Florent Chuffart
276 935a568c Florent Chuffart
**WARNING** TemplateFilter returns a list of nucleosomes. Each
277 935a568c Florent Chuffart
nucleosome is define by its center and its width. An odd width leads
278 935a568c Florent Chuffart
us to considere non interger lower and upper bound.
279 935a568c Florent Chuffart
280 935a568c Florent Chuffart
**WARNING** TemplateFilter is not design to deal with replicate. So we
281 935a568c Florent Chuffart
choose to keep a maximum of nucleosome and filter in a second time
282 935a568c Florent Chuffart
using the benefit of replicate. To do that we set a low correlation
283 935a568c Florent Chuffart
threshold parameter (*0.5*) and a particularly high value of
284 935a568c Florent Chuffart
overlaping (*300%*).
285 935a568c Florent Chuffart
286 935a568c Florent Chuffart
This step is performed by the followinw part of the *wf.py* script:
287 935a568c Florent Chuffart
288 935a568c Florent Chuffart
+----+--------+------------+---------------+------------------+
289 935a568c Florent Chuffart
| id | strain | found nucs | nuc file size | process duration |
290 935a568c Florent Chuffart
+====+========+============+===============+==================+
291 935a568c Florent Chuffart
| 1  | BY     | 96214      | 68 Mo         | 1022 s.          |
292 935a568c Florent Chuffart
+----+--------+------------+---------------+------------------+
293 935a568c Florent Chuffart
| 2  | BY     | 91694      | 65 Mo         | 1038 s.          |
294 935a568c Florent Chuffart
+----+--------+------------+---------------+------------------+
295 935a568c Florent Chuffart
| 3  | BY     | 91205      | 65 Mo         | 1036 s.          |
296 935a568c Florent Chuffart
+----+--------+------------+---------------+------------------+
297 935a568c Florent Chuffart
| 4  | RM     | 88076      | 62 Mo         | 984 s.           |
298 935a568c Florent Chuffart
+----+--------+------------+---------------+------------------+
299 935a568c Florent Chuffart
| 5  | RM     | 90141      | 64 Mo         | 967 s.           |
300 935a568c Florent Chuffart
+----+--------+------------+---------------+------------------+
301 935a568c Florent Chuffart
| 6  | RM     | 87517      | 62 Mo         | 980 s.           |
302 935a568c Florent Chuffart
+----+--------+------------+---------------+------------------+
303 935a568c Florent Chuffart
| 7  | YJM    | 88945      | 64 Mo         | 566 s.           |
304 935a568c Florent Chuffart
+----+--------+------------+---------------+------------------+
305 935a568c Florent Chuffart
| 8  | YJM    | 88689      | 64 Mo         | 570 s.           |
306 935a568c Florent Chuffart
+----+--------+------------+---------------+------------------+
307 935a568c Florent Chuffart
| 9  | YJM    | 88128      | 63 Mo         | 565 s.           |
308 935a568c Florent Chuffart
+----+--------+------------+---------------+------------------+
309 935a568c Florent Chuffart
310 935a568c Florent Chuffart
311 935a568c Florent Chuffart
Inferring Nucleosome Position and Extracting Read Counts
312 935a568c Florent Chuffart
========================================================
313 935a568c Florent Chuffart
314 935a568c Florent Chuffart
This preprocessing step consists in the 4 main steps embed in the
315 935a568c Florent Chuffart
*wf.py* and described bellow. As a preamble, this script computes
316 935a568c Florent Chuffart
*samples* *samples_mnase* and *strains* that will be used along the 4
317 935a568c Florent Chuffart
steps.
318 935a568c Florent Chuffart
319 935a568c Florent Chuffart
The second part of the tutoriel use *R*
320 b20637ed Florent Chuffart
(http://http://www.r-project.org). It consists in the following main
321 b20637ed Florent Chuffart
steps:
322 935a568c Florent Chuffart
323 935a568c Florent Chuffart
   * compute_rois.R
324 935a568c Florent Chuffart
325 935a568c Florent Chuffart
   * extract_maps.R
326 935a568c Florent Chuffart
327 b20637ed Florent Chuffart
   * compare_common_wp.R
328 b20637ed Florent Chuffart
329 b20637ed Florent Chuffart
   * split_samples.R
330 b20637ed Florent Chuffart
331 935a568c Florent Chuffart
   * count_reads.R
332 935a568c Florent Chuffart
333 935a568c Florent Chuffart
   * get_size_factors
334 935a568c Florent Chuffart
335 935a568c Florent Chuffart
   * launch_deseq.R
336 935a568c Florent Chuffart
337 935a568c Florent Chuffart
338 935a568c Florent Chuffart
Computing Common Genome Region Between Strains
339 935a568c Florent Chuffart
----------------------------------------------
340 935a568c Florent Chuffart
341 8e9facd8 Florent Chuffart
   R CMD BATCH src/current/compute_rois.R
342 935a568c Florent Chuffart
343 935a568c Florent Chuffart
344 935a568c Florent Chuffart
Extracting Maps for Well Positionned and Fuzzy Nucleosomes
345 935a568c Florent Chuffart
----------------------------------------------------------
346 935a568c Florent Chuffart
347 8e9facd8 Florent Chuffart
   R CMD BATCH src/current/extract_maps.R
348 935a568c Florent Chuffart
349 935a568c Florent Chuffart
350 b20637ed Florent Chuffart
Compute Distance Between Well Positionned Nucleosomes
351 b20637ed Florent Chuffart
-----------------------------------------------------
352 b20637ed Florent Chuffart
353 b20637ed Florent Chuffart
   R CMD BATCH src/current/compare_common_wp.R
354 b20637ed Florent Chuffart
355 b20637ed Florent Chuffart
356 b20637ed Florent Chuffart
Split and Compress Samples According CURs
357 b20637ed Florent Chuffart
-----------------------------------------
358 b20637ed Florent Chuffart
359 b20637ed Florent Chuffart
   R CMD BATCH src/current/split_samples.R
360 b20637ed Florent Chuffart
361 b20637ed Florent Chuffart
362 935a568c Florent Chuffart
Count Reads for Each Nucleosome
363 935a568c Florent Chuffart
-------------------------------
364 935a568c Florent Chuffart
365 8e9facd8 Florent Chuffart
   R CMD BATCH src/current/count_reads.R
366 935a568c Florent Chuffart
367 935a568c Florent Chuffart
368 935a568c Florent Chuffart
Get Size Factors Using DESeq
369 935a568c Florent Chuffart
----------------------------
370 935a568c Florent Chuffart
371 8e9facd8 Florent Chuffart
   R CMD BATCH src/current/get_size_factors.R
372 935a568c Florent Chuffart
373 935a568c Florent Chuffart
374 935a568c Florent Chuffart
Performing DESeq Analysis
375 935a568c Florent Chuffart
-------------------------
376 935a568c Florent Chuffart
377 8e9facd8 Florent Chuffart
   R CMD BATCH src/current/launch_deseq.R
378 935a568c Florent Chuffart
379 935a568c Florent Chuffart
380 935a568c Florent Chuffart
Results
381 935a568c Florent Chuffart
=======
382 935a568c Florent Chuffart
383 935a568c Florent Chuffart
384 935a568c Florent Chuffart
Output Files Organisation
385 935a568c Florent Chuffart
-------------------------
386 935a568c Florent Chuffart
387 935a568c Florent Chuffart
Previous steps produce following 45 files. Each filename is under the
388 935a568c Florent Chuffart
form
389 935a568c Florent Chuffart
390 8e9facd8 Florent Chuffart
   results/current/[combi]_[marker]_[form]_snep.tab
391 935a568c Florent Chuffart
392 935a568c Florent Chuffart
Where combi is in {BY_RM, BY_YJM, RM_YJM} for each strain combination,
393 935a568c Florent Chuffart
marker is in {H3K4me1, H3K4me3, H3K9ac, H3K14ac, H4K12ac} for each
394 935a568c Florent Chuffart
post translational histone modification and form is in {wp, fuzzy,
395 935a568c Florent Chuffart
wpfuzzy} considering well positionned nucleosomes, fuzzy nucleosomes
396 935a568c Florent Chuffart
or both for SNEP computation.
397 935a568c Florent Chuffart
398 935a568c Florent Chuffart
chr_BY lower_bound_BY upper_bound_BY index_nuc_BY chr_RM
399 935a568c Florent Chuffart
lower_bound_RM upper_bound_RM index_nuc_RM roi_index form
400 935a568c Florent Chuffart
BY_Mnase_Seq_1 BY_Mnase_Seq_2 BY_Mnase_Seq_3 RM_Mnase_Seq_4
401 935a568c Florent Chuffart
RM_Mnase_Seq_5 RM_Mnase_Seq_6 BY_H3K14ac_36 BY_H3K14ac_37
402 935a568c Florent Chuffart
BY_H3K14ac_53 RM_H3K14ac_38 RM_H3K14ac_39 pvalsGLM
403 935a568c Florent Chuffart
404 935a568c Florent Chuffart
For each file, there is 1 line per nucleosome and each line is
405 935a568c Florent Chuffart
composed of many columns divided into 3 main topics:
406 935a568c Florent Chuffart
   * nuc information
407 935a568c Florent Chuffart
408 935a568c Florent Chuffart
   * number opf reads for each sample
409 935a568c Florent Chuffart
410 935a568c Florent Chuffart
   * DESeq analysis results.
411 935a568c Florent Chuffart
412 935a568c Florent Chuffart
For exemple for the file *BY_RM_H3K14ac_wp_snep.tab* informations are:
413 935a568c Florent Chuffart
   * chr_BY, the BY chr involved
414 935a568c Florent Chuffart
415 935a568c Florent Chuffart
   * lower_bound_BY, the lower bound of the BY nuc
416 935a568c Florent Chuffart
417 935a568c Florent Chuffart
   * upper_bound_BY, the upper_bound of the BY nuc
418 935a568c Florent Chuffart
419 8e9facd8 Florent Chuffart
   * index_nuc_BY, the index of the nuc in the entire list of BY
420 8e9facd8 Florent Chuffart
     nucs
421 935a568c Florent Chuffart
422 935a568c Florent Chuffart
   * chr_RM, lower_bound_RM, upper_bound_RM, index_nuc_RM
423 8e9facd8 Florent Chuffart
424 935a568c Florent Chuffart
        are the same information for the RM strain
425 935a568c Florent Chuffart
426 935a568c Florent Chuffart
   * roi_index, the index of the region of interrest involved.
427 935a568c Florent Chuffart
428 935a568c Florent Chuffart
Next cols concern indicators for each sample. They are labeled
429 935a568c Florent Chuffart
[strain]_[marker]_[sample_id] and each value represents the number of
430 935a568c Florent Chuffart
reads for the current nuc for the sample *sample_id*.
431 935a568c Florent Chuffart
432 935a568c Florent Chuffart
The 5 final columns concern DESeq analysis:
433 8e9facd8 Florent Chuffart
   * manip[a_manip] strain[a_strain]
434 8e9facd8 Florent Chuffart
     manip[a_strain]:strain[a_strain], the manip (marker) effect, the
435 8e9facd8 Florent Chuffart
     strain effect and the snep effect.
436 935a568c Florent Chuffart
437 8e9facd8 Florent Chuffart
   * pvalsGLM, the pvalue resulting of the comparison of the GLM
438 8e9facd8 Florent Chuffart
     model considering or the interaction term *marker:strain*
439 935a568c Florent Chuffart
440 935a568c Florent Chuffart
   * snep_index, a boolean set to TRUE if the *pvalueGLM* value is
441 935a568c Florent Chuffart
     under the threshold computed with FDR function with a rate set to
442 935a568c Florent Chuffart
     0.01%.
443 935a568c Florent Chuffart
444 935a568c Florent Chuffart
It also produces the file that explicts size factor for each involved
445 935a568c Florent Chuffart
sample in differents strain combination and nucleosomal region type:
446 935a568c Florent Chuffart
447 8e9facd8 Florent Chuffart
TODO: include this file...
448 8e9facd8 Florent Chuffart
/home/filleton/analyses/snepcatalog/data/2013-10-09/current/README.txt
449 935a568c Florent Chuffart
450 8e9facd8 Florent Chuffart
   results/current/size_factors.tab
451 935a568c Florent Chuffart
452 935a568c Florent Chuffart
453 935a568c Florent Chuffart
Number of SNEPs
454 935a568c Florent Chuffart
---------------
455 935a568c Florent Chuffart
456 935a568c Florent Chuffart
Here are the number of computed for each forms.
457 935a568c Florent Chuffart
458 935a568c Florent Chuffart
   [1] "wp"
459 935a568c Florent Chuffart
          #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
460 935a568c Florent Chuffart
   BY-RM  30234     520     798     83    3566      26
461 935a568c Florent Chuffart
   BY-YJM 31298     303     619    102     103     128
462 935a568c Florent Chuffart
   RM-YJM 29863     129     340     46    3177      18
463 935a568c Florent Chuffart
   [1] "fuzzy"
464 935a568c Florent Chuffart
          #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
465 935a568c Florent Chuffart
   BY-RM  10748     294     308    101    1681      42
466 935a568c Florent Chuffart
   BY-YJM 10669     122     176    124      93      87
467 935a568c Florent Chuffart
   RM-YJM 11478      54     112     41    1389      20
468 935a568c Florent Chuffart
   [1] "wpfuzzy"
469 935a568c Florent Chuffart
          #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac
470 935a568c Florent Chuffart
   BY-RM  40982     770    1136    183    5404      73
471 935a568c Florent Chuffart
   BY-YJM 41967     439     804    214     198     199
472 935a568c Florent Chuffart
   RM-YJM 41341     184     468     87    4687      37
473 935a568c Florent Chuffart
474 935a568c Florent Chuffart
TODO:
475 935a568c Florent Chuffart
   * Print/study intra/inter strain LODs.
476 935a568c Florent Chuffart
477 8e9facd8 Florent Chuffart
   * Check the normality of sample using Shapiro–Wilk (Hypothesis
478 8e9facd8 Florent Chuffart
     for computing LODs)