root / doc / sphinx_doc / build / text / tuto.txt @ 21b8928f
Historique | Voir | Annoter | Télécharger (25,18 ko)
1 | 935a568c | Florent Chuffart | |
---|---|---|---|
2 | 935a568c | Florent Chuffart | Tutorial |
3 | 935a568c | Florent Chuffart | ******** |
4 | 935a568c | Florent Chuffart | |
5 | 935a568c | Florent Chuffart | This tutorial describes steps allowing to perform quantitave analysis |
6 | 935a568c | Florent Chuffart | of nucleosomal epigenome. We assume that files are organised around a |
7 | 935a568c | Florent Chuffart | given hierarchie and that all command lines are launched from |
8 | 935a568c | Florent Chuffart | project's root. |
9 | 935a568c | Florent Chuffart | |
10 | 935a568c | Florent Chuffart | This tutorial is divided into t=wo main parts. First one consists in |
11 | 935a568c | Florent Chuffart | the python script *wf.py* that aligns and convert Illumina reads. |
12 | 935a568c | Florent Chuffart | Second one is the R script *main.r* that extracts information |
13 | 935a568c | Florent Chuffart | (nucleosome position and indicators) from the dataset. |
14 | 935a568c | Florent Chuffart | |
15 | 935a568c | Florent Chuffart | |
16 | 935a568c | Florent Chuffart | Dataset and Configuration File |
17 | 935a568c | Florent Chuffart | ============================== |
18 | 935a568c | Florent Chuffart | |
19 | 935a568c | Florent Chuffart | We want to compare nucleosomes of 3 yeast strains: |
20 | 935a568c | Florent Chuffart | |
21 | 935a568c | Florent Chuffart | * BY |
22 | 935a568c | Florent Chuffart | |
23 | 935a568c | Florent Chuffart | * RM |
24 | 935a568c | Florent Chuffart | |
25 | 935a568c | Florent Chuffart | * YJM |
26 | 935a568c | Florent Chuffart | |
27 | 935a568c | Florent Chuffart | For each strain we perform Mnase-Seq and ChIP-Seq using the 5 |
28 | 935a568c | Florent Chuffart | following markers: |
29 | 935a568c | Florent Chuffart | |
30 | 935a568c | Florent Chuffart | * H3K4me1 |
31 | 935a568c | Florent Chuffart | |
32 | 935a568c | Florent Chuffart | * H3K4me3 |
33 | 935a568c | Florent Chuffart | |
34 | 935a568c | Florent Chuffart | * H3K9ac |
35 | 935a568c | Florent Chuffart | |
36 | 935a568c | Florent Chuffart | * H3K14ac |
37 | 935a568c | Florent Chuffart | |
38 | 935a568c | Florent Chuffart | * H4K12ac |
39 | 935a568c | Florent Chuffart | |
40 | 935a568c | Florent Chuffart | In order to simplify the design of exeriment, we considere Mnase as a |
41 | 935a568c | Florent Chuffart | marker. For each couple *(strain, marker)* we perform 3 replicates. |
42 | 935a568c | Florent Chuffart | So, theoritically we should have *3 * (1 + 5) * 3 = 54* samples. In |
43 | 935a568c | Florent Chuffart | practice we only obtain 2 replicates for *(YJM, H3K4me1)*. Each one of |
44 | 935a568c | Florent Chuffart | the 53 samples is indentify by a uniq identifier. The file |
45 | 935a568c | Florent Chuffart | *CSV_SAMPLE_FILE* sums up this information. |
46 | 935a568c | Florent Chuffart | |
47 | 935a568c | Florent Chuffart | We use a convention to link sample and Illumina fastq outputs. |
48 | 935a568c | Florent Chuffart | Illumina output files of the sample *ID* will be stored in the |
49 | 935a568c | Florent Chuffart | directory *ILLUMINA_OUTPUTFILE_PREFIX* + *ID*. For example, sample 41 |
50 | 935a568c | Florent Chuffart | outputs will be stored in the directory |
51 | 935a568c | Florent Chuffart | *data/2012-09-05/FASTQ/Sample_Yvert_Bq41/*. |
52 | 935a568c | Florent Chuffart | |
53 | 935a568c | Florent Chuffart | For BY (resp. RM and YJM) we use following reference genome |
54 | 935a568c | Florent Chuffart | *saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta* (resp. |
55 | 935a568c | Florent Chuffart | *saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta* and |
56 | 935a568c | Florent Chuffart | *saccharomyces_cerevisiae_YJM_789_screencontig.fasta*). The index |
57 | 935a568c | Florent Chuffart | *FASTA_REFERENCE_GENOME_FILES* stores this information. |
58 | 935a568c | Florent Chuffart | |
59 | 935a568c | Florent Chuffart | Each chromosome/contig is identify in the fasta file by an obscure |
60 | 935a568c | Florent Chuffart | identifier. For example, BY chromosome I is identify by |
61 | 935a568c | Florent Chuffart | *gi|144228165|ref|NC_001133.7|* when TemplateFilter is waiting for an |
62 | 935a568c | Florent Chuffart | integer. So, we translate it. The index *FASTA_INDEXES* stores this |
63 | 935a568c | Florent Chuffart | translation. |
64 | 935a568c | Florent Chuffart | |
65 | 935a568c | Florent Chuffart | From a pragamatical point of view we discard some part of the genome |
66 | 935a568c | Florent Chuffart | (repeated sequence etc...). The list of the black listed area is |
67 | 935a568c | Florent Chuffart | explicitely detailled in *AREA_BLACK_LIST*. |
68 | 935a568c | Florent Chuffart | |
69 | 935a568c | Florent Chuffart | For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use |
70 | 935a568c | Florent Chuffart | previously compute .c2c file |
71 | 935a568c | Florent Chuffart | *data/2012-03_primarydata/BY_RM_gxcomp.c2c* (resp. |
72 | 935a568c | Florent Chuffart | *BY_YJM_GComp_All.c2c* and *RM_YJM_gxcomp.c2c*). For more information |
73 | 935a568c | Florent Chuffart | about .c2c files, please read section 5 of the manual of |
74 | 935a568c | Florent Chuffart | *NucleoMiner*, the old version of *NucleoMiner2* (http://www.ens- |
75 | 935a568c | Florent Chuffart | lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf). |
76 | 935a568c | Florent Chuffart | |
77 | 935a568c | Florent Chuffart | *nucleominer* uses specific directory to work in, these are described |
78 | 935a568c | Florent Chuffart | in *INDEX_DIR*, *ALIGN_DIR* and *LOG_DIR*. |
79 | 935a568c | Florent Chuffart | |
80 | 935a568c | Florent Chuffart | Finally, *nucleominer* use external ressources, the path to these |
81 | 935a568c | Florent Chuffart | resspources are describe in *BOWTIE_BUILD_BIN*, *BOWTIE2_BIN*, |
82 | 935a568c | Florent Chuffart | *SAMTOOLS_BIN*, *BEDTOOLS_BIN* and *TF_BIN* and *TF_TEMPLATES_FILE*. |
83 | 935a568c | Florent Chuffart | |
84 | 935a568c | Florent Chuffart | All paths, prefixes and indexes could be change in the |
85 | 8e9facd8 | Florent Chuffart | *src/current/nucleominer_config.json* file. |
86 | 935a568c | Florent Chuffart | |
87 | 935a568c | Florent Chuffart | |
88 | 935a568c | Florent Chuffart | Preprocessing Illumina Fastq Reads for Each Sample |
89 | 935a568c | Florent Chuffart | ================================================== |
90 | 935a568c | Florent Chuffart | |
91 | 935a568c | Florent Chuffart | This preprocessing step consists in the 4 main steps embed in the |
92 | 935a568c | Florent Chuffart | *wf.py* and described bellow. As a preamble, this script computes |
93 | 935a568c | Florent Chuffart | *samples* *samples_mnase* and *strains* that will be used along the 4 |
94 | 935a568c | Florent Chuffart | steps. |
95 | 935a568c | Florent Chuffart | |
96 | 935a568c | Florent Chuffart | |
97 | 935a568c | Florent Chuffart | Creating Bowtie Index from each Reference Genome |
98 | 935a568c | Florent Chuffart | ------------------------------------------------ |
99 | 935a568c | Florent Chuffart | |
100 | 935a568c | Florent Chuffart | For each strain, we need to create bowtie index. Bowtie index of a |
101 | 935a568c | Florent Chuffart | strain is a tree view of the genemoe reference for this strain. It |
102 | 935a568c | Florent Chuffart | will be used by bowtie to align reads. This step is performed by the |
103 | 935a568c | Florent Chuffart | following part of the *wf.py* script: |
104 | 935a568c | Florent Chuffart | |
105 | 935a568c | Florent Chuffart | The following table sum up involved file sizes and process durations |
106 | 935a568c | Florent Chuffart | concerning this step. |
107 | 935a568c | Florent Chuffart | |
108 | 935a568c | Florent Chuffart | +--------+------------------------+------------------------+------------------+ |
109 | 935a568c | Florent Chuffart | | strain | fasta genome file size | bowtie index file size | process duration | |
110 | 935a568c | Florent Chuffart | +========+========================+========================+==================+ |
111 | 935a568c | Florent Chuffart | | BY | 12 Mo | 25 Mo | 11 s. | |
112 | 935a568c | Florent Chuffart | +--------+------------------------+------------------------+------------------+ |
113 | 935a568c | Florent Chuffart | | RM | 12 Mo | 24 Mo | 9 s. | |
114 | 935a568c | Florent Chuffart | +--------+------------------------+------------------------+------------------+ |
115 | 935a568c | Florent Chuffart | | YJM | 12 Mo | 25 Mo | 11 s. | |
116 | 935a568c | Florent Chuffart | +--------+------------------------+------------------------+------------------+ |
117 | 935a568c | Florent Chuffart | |
118 | 935a568c | Florent Chuffart | |
119 | 935a568c | Florent Chuffart | Aligning Reads to Reference Genome |
120 | 935a568c | Florent Chuffart | ---------------------------------- |
121 | 935a568c | Florent Chuffart | |
122 | 935a568c | Florent Chuffart | Next, we launch bowtie to align reads to the reference genome. It |
123 | 935a568c | Florent Chuffart | produces a *.sam* file that we convert into a *.bed* file. Binaries |
124 | 935a568c | Florent Chuffart | for *bowtie*, *samtools* and *bedtools* are wrapped using python |
125 | 935a568c | Florent Chuffart | *subprocess* class. This step is performed by the followinw part of |
126 | 935a568c | Florent Chuffart | the *wf.py* script: |
127 | 935a568c | Florent Chuffart | |
128 | 935a568c | Florent Chuffart | |
129 | 935a568c | Florent Chuffart | Convert Aligned Reads for TemplateFilter |
130 | 935a568c | Florent Chuffart | ---------------------------------------- |
131 | 935a568c | Florent Chuffart | |
132 | 935a568c | Florent Chuffart | TemplateFilter use particular input format for reads, so we convert |
133 | 935a568c | Florent Chuffart | *.bed* file. TemplateFilter expect reads as following: *chr coord |
134 | 935a568c | Florent Chuffart | strand #read* where: |
135 | 935a568c | Florent Chuffart | |
136 | 935a568c | Florent Chuffart | * chr is the number of the chromosome; |
137 | 935a568c | Florent Chuffart | |
138 | 935a568c | Florent Chuffart | * coord is the coordinate of the reads; |
139 | 935a568c | Florent Chuffart | |
140 | 935a568c | Florent Chuffart | * strand is *F* for forward and *R* for reverse; |
141 | 935a568c | Florent Chuffart | |
142 | 935a568c | Florent Chuffart | * #reads the number of reads for this position. |
143 | 935a568c | Florent Chuffart | |
144 | 935a568c | Florent Chuffart | Each entry is *tab*-separated. |
145 | 935a568c | Florent Chuffart | |
146 | 935a568c | Florent Chuffart | **WARNING** for reverse strand bowtie returns the position of left |
147 | 935a568c | Florent Chuffart | first nucleotid when TemplateFilter is waiting for right one. So this |
148 | 935a568c | Florent Chuffart | step takes it into account and add lenght of reads (in our case 50) to |
149 | 935a568c | Florent Chuffart | reverse reads coordinate. |
150 | 935a568c | Florent Chuffart | |
151 | 935a568c | Florent Chuffart | This step is performed by the followinw part of the *wf.py* script: |
152 | 935a568c | Florent Chuffart | |
153 | 935a568c | Florent Chuffart | The following table sum up number of reads, involved file sizes and |
154 | 935a568c | Florent Chuffart | process durations concerning the two last steps. In our case, aligment |
155 | 935a568c | Florent Chuffart | process have been multuthreaded over over 3 cores. |
156 | 935a568c | Florent Chuffart | |
157 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
158 | 935a568c | Florent Chuffart | | id | Illumina reads | aligned and filtred reads | ratio | *.bed* file size | TF input file size | process duration | |
159 | 935a568c | Florent Chuffart | +====+================+===========================+========+==================+====================+==================+ |
160 | 935a568c | Florent Chuffart | | 1 | 16436138 | 10199695 | 62,06% | 1064 Mo | 60 Mo | 383 s. | |
161 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
162 | 935a568c | Florent Chuffart | | 2 | 16911132 | 12512727 | 73,99% | 1298 Mo | 64 Mo | 437 s. | |
163 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
164 | 935a568c | Florent Chuffart | | 3 | 15946902 | 12340426 | 77,38% | 1280 Mo | 65 Mo | 423 s. | |
165 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
166 | 935a568c | Florent Chuffart | | 4 | 13765584 | 10381903 | 75,42% | 931 Mo | 59 Mo | 352 s. | |
167 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
168 | 935a568c | Florent Chuffart | | 5 | 15168268 | 11502855 | 75,83% | 1031 Mo | 64 Mo | 386 s. | |
169 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
170 | 935a568c | Florent Chuffart | | 6 | 18850820 | 14024905 | 74,40% | 1254 Mo | 69 Mo | 482 s. | |
171 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
172 | 935a568c | Florent Chuffart | | 7 | 15591124 | 12126623 | 77,78% | 1163 Mo | 72 Mo | 405 s. | |
173 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
174 | 935a568c | Florent Chuffart | | 8 | 15659905 | 12475664 | 79,67% | 1194 Mo | 71 Mo | 416 s. | |
175 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
176 | 935a568c | Florent Chuffart | | 9 | 14668641 | 10960565 | 74,72% | 1052 Mo | 70 Mo | 375 s. | |
177 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
178 | 935a568c | Florent Chuffart | | 10 | 14339179 | 10454451 | 72,91% | 1049 Mo | 51 Mo | 363 s. | |
179 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
180 | 935a568c | Florent Chuffart | | 11 | 18019895 | 13688774 | 75,96% | 1378 Mo | 59 Mo | 474 s. | |
181 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
182 | 935a568c | Florent Chuffart | | 12 | 13746796 | 10810022 | 78,64% | 1084 Mo | 54 Mo | 360 s. | |
183 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
184 | 935a568c | Florent Chuffart | | 13 | 15205065 | 11766016 | 77,38% | 990 Mo | 54 Mo | 381 s. | |
185 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
186 | 935a568c | Florent Chuffart | | 14 | 17803097 | 13838883 | 77,73% | 1154 Mo | 60 Mo | 452 s. | |
187 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
188 | 935a568c | Florent Chuffart | | 15 | 15434564 | 12307878 | 79,74% | 1032 Mo | 57 Mo | 394 s. | |
189 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
190 | 935a568c | Florent Chuffart | | 16 | 16802587 | 12725665 | 75,74% | 1221 Mo | 48 Mo | 438 s. | |
191 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
192 | 935a568c | Florent Chuffart | | 17 | 16058417 | 12513734 | 77,93% | 1192 Mo | 63 Mo | 422 s. | |
193 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
194 | 935a568c | Florent Chuffart | | 18 | 16154482 | 13204331 | 81,74% | 1277 Mo | 52 Mo | 430 s. | |
195 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
196 | 935a568c | Florent Chuffart | | 19 | 21013924 | 17102120 | 81,38% | 1646 Mo | 59 Mo | 555 s. | |
197 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
198 | 935a568c | Florent Chuffart | | 20 | 17213114 | 14433357 | 83,85% | 1389 Mo | 53 Mo | 459 s. | |
199 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
200 | 935a568c | Florent Chuffart | | 21 | 17360907 | 14733001 | 84,86% | 1203 Mo | 55 Mo | 450 s. | |
201 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
202 | 935a568c | Florent Chuffart | | 22 | 18136816 | 15389581 | 84,85% | 1257 Mo | 53 Mo | 469 s. | |
203 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
204 | 935a568c | Florent Chuffart | | 23 | 14763678 | 12173025 | 82,45% | 1140 Mo | 56 Mo | 393 s. | |
205 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
206 | 935a568c | Florent Chuffart | | 24 | 15541709 | 12890345 | 82,94% | 1057 Mo | 48 Mo | 398 s. | |
207 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
208 | 935a568c | Florent Chuffart | | 25 | 16433215 | 13094314 | 79,68% | 1241 Mo | 57 Mo | 433 s. | |
209 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
210 | 935a568c | Florent Chuffart | | 26 | 17370850 | 14264136 | 82,12% | 1347 Mo | 51 Mo | 466 s. | |
211 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
212 | 935a568c | Florent Chuffart | | 27 | 14613512 | 8654495 | 59,22% | 887 Mo | 56 Mo | 339 s. | |
213 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
214 | 935a568c | Florent Chuffart | | 28 | 15248545 | 11367589 | 74,55% | 1166 Mo | 67 Mo | 405 s. | |
215 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
216 | 935a568c | Florent Chuffart | | 29 | 14316809 | 10767926 | 75,21% | 1103 Mo | 63 Mo | 379 s. | |
217 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
218 | 935a568c | Florent Chuffart | | 30 | 15178058 | 12265794 | 80,81% | 1030 Mo | 66 Mo | 390 s. | |
219 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
220 | 935a568c | Florent Chuffart | | 31 | 14968579 | 11876186 | 79,34% | 1009 Mo | 63 Mo | 387 s. | |
221 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
222 | 935a568c | Florent Chuffart | | 32 | 16912705 | 13550508 | 80,12% | 1143 Mo | 70 Mo | 442 s. | |
223 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
224 | 935a568c | Florent Chuffart | | 33 | 16782154 | 12755111 | 76,00% | 1227 Mo | 65 Mo | 438 s. | |
225 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
226 | 935a568c | Florent Chuffart | | 34 | 16741443 | 13168071 | 78,66% | 1260 Mo | 71 Mo | 442 s. | |
227 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
228 | 935a568c | Florent Chuffart | | 35 | 13096171 | 10367041 | 79,16% | 992 Mo | 62 Mo | 350 s. | |
229 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
230 | 935a568c | Florent Chuffart | | 36 | 17715118 | 14092985 | 79,55% | 1404 Mo | 68 Mo | 483 s. | |
231 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
232 | 935a568c | Florent Chuffart | | 37 | 17288466 | 7402082 | 42,82% | 741 Mo | 48 Mo | 339 s. | |
233 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
234 | 935a568c | Florent Chuffart | | 38 | 16116394 | 13178457 | 81,77% | 1101 Mo | 63 Mo | 420 s. | |
235 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
236 | 935a568c | Florent Chuffart | | 39 | 14241106 | 10537228 | 73,99% | 880 Mo | 57 Mo | 348 s. | |
237 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
238 | 935a568c | Florent Chuffart | | 40 | 13784738 | 10598464 | 76,89% | 1005 Mo | 64 Mo | 358 s. | |
239 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
240 | 935a568c | Florent Chuffart | | 41 | 12438007 | 9620975 | 77,35% | 911 Mo | 60 Mo | 326 s. | |
241 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
242 | 935a568c | Florent Chuffart | | 42 | 13853959 | 11031238 | 79,63% | 1045 Mo | 64 Mo | 365 s. | |
243 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
244 | 935a568c | Florent Chuffart | | 43 | 12036162 | 6654780 | 55,29% | 684 Mo | 46 Mo | 268 s. | |
245 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
246 | 935a568c | Florent Chuffart | | 44 | 13873129 | 10251074 | 73,89% | 1048 Mo | 61 Mo | 365 s. | |
247 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
248 | 935a568c | Florent Chuffart | | 45 | 19817751 | 14904502 | 75,21% | 1520 Mo | 72 Mo | 528 s. | |
249 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
250 | 935a568c | Florent Chuffart | | 46 | 13368959 | 10818619 | 80,92% | 912 Mo | 63 Mo | 350 s. | |
251 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
252 | 935a568c | Florent Chuffart | | 47 | 7566467 | 6139001 | 81,13% | 520 Mo | 44 Mo | 201 s. | |
253 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
254 | 935a568c | Florent Chuffart | | 48 | 32586928 | 21191363 | 65,03% | 1816 Mo | 82 Mo | 766 s. | |
255 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
256 | 935a568c | Florent Chuffart | | 49 | 30733184 | 18791373 | 61,14% | 1801 Mo | 89 Mo | 721 s. | |
257 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
258 | 935a568c | Florent Chuffart | | 50 | 41287616 | 30383875 | 73,59% | 2911 Mo | 112 Mo | 1065 s. | |
259 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
260 | 935a568c | Florent Chuffart | | 51 | 40439965 | 31177914 | 77,10% | 2981 Mo | 117 Mo | 1070 s. | |
261 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
262 | 935a568c | Florent Chuffart | | 53 | 40876476 | 33780065 | 82,64% | 3316 Mo | 103 Mo | 1165 s. | |
263 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
264 | 935a568c | Florent Chuffart | | 55 | 52424414 | 47117107 | 89,88% | 3811 Mo | 119 Mo | 1477 s. | |
265 | 935a568c | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
266 | 935a568c | Florent Chuffart | |
267 | 935a568c | Florent Chuffart | For some reasons (manipulation efficency, e.g. PCR...), we remove |
268 | 935a568c | Florent Chuffart | samples 33, 45, 48 and 55. |
269 | 935a568c | Florent Chuffart | |
270 | 935a568c | Florent Chuffart | |
271 | 935a568c | Florent Chuffart | Run TemplateFilter on Mnase Samples |
272 | 935a568c | Florent Chuffart | ----------------------------------- |
273 | 935a568c | Florent Chuffart | |
274 | 935a568c | Florent Chuffart | Finally, for each sample we perfome TemplateFilter analysis. |
275 | 935a568c | Florent Chuffart | |
276 | 935a568c | Florent Chuffart | **WARNING** TemplateFilter returns a list of nucleosomes. Each |
277 | 935a568c | Florent Chuffart | nucleosome is define by its center and its width. An odd width leads |
278 | 935a568c | Florent Chuffart | us to considere non interger lower and upper bound. |
279 | 935a568c | Florent Chuffart | |
280 | 935a568c | Florent Chuffart | **WARNING** TemplateFilter is not design to deal with replicate. So we |
281 | 935a568c | Florent Chuffart | choose to keep a maximum of nucleosome and filter in a second time |
282 | 935a568c | Florent Chuffart | using the benefit of replicate. To do that we set a low correlation |
283 | 935a568c | Florent Chuffart | threshold parameter (*0.5*) and a particularly high value of |
284 | 935a568c | Florent Chuffart | overlaping (*300%*). |
285 | 935a568c | Florent Chuffart | |
286 | 935a568c | Florent Chuffart | This step is performed by the followinw part of the *wf.py* script: |
287 | 935a568c | Florent Chuffart | |
288 | 935a568c | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
289 | 935a568c | Florent Chuffart | | id | strain | found nucs | nuc file size | process duration | |
290 | 935a568c | Florent Chuffart | +====+========+============+===============+==================+ |
291 | 935a568c | Florent Chuffart | | 1 | BY | 96214 | 68 Mo | 1022 s. | |
292 | 935a568c | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
293 | 935a568c | Florent Chuffart | | 2 | BY | 91694 | 65 Mo | 1038 s. | |
294 | 935a568c | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
295 | 935a568c | Florent Chuffart | | 3 | BY | 91205 | 65 Mo | 1036 s. | |
296 | 935a568c | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
297 | 935a568c | Florent Chuffart | | 4 | RM | 88076 | 62 Mo | 984 s. | |
298 | 935a568c | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
299 | 935a568c | Florent Chuffart | | 5 | RM | 90141 | 64 Mo | 967 s. | |
300 | 935a568c | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
301 | 935a568c | Florent Chuffart | | 6 | RM | 87517 | 62 Mo | 980 s. | |
302 | 935a568c | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
303 | 935a568c | Florent Chuffart | | 7 | YJM | 88945 | 64 Mo | 566 s. | |
304 | 935a568c | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
305 | 935a568c | Florent Chuffart | | 8 | YJM | 88689 | 64 Mo | 570 s. | |
306 | 935a568c | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
307 | 935a568c | Florent Chuffart | | 9 | YJM | 88128 | 63 Mo | 565 s. | |
308 | 935a568c | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
309 | 935a568c | Florent Chuffart | |
310 | 935a568c | Florent Chuffart | |
311 | 935a568c | Florent Chuffart | Inferring Nucleosome Position and Extracting Read Counts |
312 | 935a568c | Florent Chuffart | ======================================================== |
313 | 935a568c | Florent Chuffart | |
314 | 935a568c | Florent Chuffart | This preprocessing step consists in the 4 main steps embed in the |
315 | 935a568c | Florent Chuffart | *wf.py* and described bellow. As a preamble, this script computes |
316 | 935a568c | Florent Chuffart | *samples* *samples_mnase* and *strains* that will be used along the 4 |
317 | 935a568c | Florent Chuffart | steps. |
318 | 935a568c | Florent Chuffart | |
319 | 935a568c | Florent Chuffart | The second part of the tutoriel use *R* |
320 | b20637ed | Florent Chuffart | (http://http://www.r-project.org). It consists in the following main |
321 | b20637ed | Florent Chuffart | steps: |
322 | 935a568c | Florent Chuffart | |
323 | 935a568c | Florent Chuffart | * compute_rois.R |
324 | 935a568c | Florent Chuffart | |
325 | 935a568c | Florent Chuffart | * extract_maps.R |
326 | 935a568c | Florent Chuffart | |
327 | b20637ed | Florent Chuffart | * compare_common_wp.R |
328 | b20637ed | Florent Chuffart | |
329 | b20637ed | Florent Chuffart | * split_samples.R |
330 | b20637ed | Florent Chuffart | |
331 | 935a568c | Florent Chuffart | * count_reads.R |
332 | 935a568c | Florent Chuffart | |
333 | 935a568c | Florent Chuffart | * get_size_factors |
334 | 935a568c | Florent Chuffart | |
335 | 935a568c | Florent Chuffart | * launch_deseq.R |
336 | 935a568c | Florent Chuffart | |
337 | 935a568c | Florent Chuffart | |
338 | 935a568c | Florent Chuffart | Computing Common Genome Region Between Strains |
339 | 935a568c | Florent Chuffart | ---------------------------------------------- |
340 | 935a568c | Florent Chuffart | |
341 | 8e9facd8 | Florent Chuffart | R CMD BATCH src/current/compute_rois.R |
342 | 935a568c | Florent Chuffart | |
343 | 935a568c | Florent Chuffart | |
344 | 935a568c | Florent Chuffart | Extracting Maps for Well Positionned and Fuzzy Nucleosomes |
345 | 935a568c | Florent Chuffart | ---------------------------------------------------------- |
346 | 935a568c | Florent Chuffart | |
347 | 8e9facd8 | Florent Chuffart | R CMD BATCH src/current/extract_maps.R |
348 | 935a568c | Florent Chuffart | |
349 | 935a568c | Florent Chuffart | |
350 | b20637ed | Florent Chuffart | Compute Distance Between Well Positionned Nucleosomes |
351 | b20637ed | Florent Chuffart | ----------------------------------------------------- |
352 | b20637ed | Florent Chuffart | |
353 | b20637ed | Florent Chuffart | R CMD BATCH src/current/compare_common_wp.R |
354 | b20637ed | Florent Chuffart | |
355 | b20637ed | Florent Chuffart | |
356 | b20637ed | Florent Chuffart | Split and Compress Samples According CURs |
357 | b20637ed | Florent Chuffart | ----------------------------------------- |
358 | b20637ed | Florent Chuffart | |
359 | b20637ed | Florent Chuffart | R CMD BATCH src/current/split_samples.R |
360 | b20637ed | Florent Chuffart | |
361 | b20637ed | Florent Chuffart | |
362 | 935a568c | Florent Chuffart | Count Reads for Each Nucleosome |
363 | 935a568c | Florent Chuffart | ------------------------------- |
364 | 935a568c | Florent Chuffart | |
365 | 8e9facd8 | Florent Chuffart | R CMD BATCH src/current/count_reads.R |
366 | 935a568c | Florent Chuffart | |
367 | 935a568c | Florent Chuffart | |
368 | 935a568c | Florent Chuffart | Get Size Factors Using DESeq |
369 | 935a568c | Florent Chuffart | ---------------------------- |
370 | 935a568c | Florent Chuffart | |
371 | 8e9facd8 | Florent Chuffart | R CMD BATCH src/current/get_size_factors.R |
372 | 935a568c | Florent Chuffart | |
373 | 935a568c | Florent Chuffart | |
374 | 935a568c | Florent Chuffart | Performing DESeq Analysis |
375 | 935a568c | Florent Chuffart | ------------------------- |
376 | 935a568c | Florent Chuffart | |
377 | 8e9facd8 | Florent Chuffart | R CMD BATCH src/current/launch_deseq.R |
378 | 935a568c | Florent Chuffart | |
379 | 935a568c | Florent Chuffart | |
380 | 935a568c | Florent Chuffart | Results |
381 | 935a568c | Florent Chuffart | ======= |
382 | 935a568c | Florent Chuffart | |
383 | 935a568c | Florent Chuffart | |
384 | 935a568c | Florent Chuffart | Output Files Organisation |
385 | 935a568c | Florent Chuffart | ------------------------- |
386 | 935a568c | Florent Chuffart | |
387 | 935a568c | Florent Chuffart | Previous steps produce following 45 files. Each filename is under the |
388 | 935a568c | Florent Chuffart | form |
389 | 935a568c | Florent Chuffart | |
390 | 8e9facd8 | Florent Chuffart | results/current/[combi]_[marker]_[form]_snep.tab |
391 | 935a568c | Florent Chuffart | |
392 | 935a568c | Florent Chuffart | Where combi is in {BY_RM, BY_YJM, RM_YJM} for each strain combination, |
393 | 935a568c | Florent Chuffart | marker is in {H3K4me1, H3K4me3, H3K9ac, H3K14ac, H4K12ac} for each |
394 | 935a568c | Florent Chuffart | post translational histone modification and form is in {wp, fuzzy, |
395 | 935a568c | Florent Chuffart | wpfuzzy} considering well positionned nucleosomes, fuzzy nucleosomes |
396 | 935a568c | Florent Chuffart | or both for SNEP computation. |
397 | 935a568c | Florent Chuffart | |
398 | 935a568c | Florent Chuffart | chr_BY lower_bound_BY upper_bound_BY index_nuc_BY chr_RM |
399 | 935a568c | Florent Chuffart | lower_bound_RM upper_bound_RM index_nuc_RM roi_index form |
400 | 935a568c | Florent Chuffart | BY_Mnase_Seq_1 BY_Mnase_Seq_2 BY_Mnase_Seq_3 RM_Mnase_Seq_4 |
401 | 935a568c | Florent Chuffart | RM_Mnase_Seq_5 RM_Mnase_Seq_6 BY_H3K14ac_36 BY_H3K14ac_37 |
402 | 935a568c | Florent Chuffart | BY_H3K14ac_53 RM_H3K14ac_38 RM_H3K14ac_39 pvalsGLM |
403 | 935a568c | Florent Chuffart | |
404 | 935a568c | Florent Chuffart | For each file, there is 1 line per nucleosome and each line is |
405 | 935a568c | Florent Chuffart | composed of many columns divided into 3 main topics: |
406 | 935a568c | Florent Chuffart | * nuc information |
407 | 935a568c | Florent Chuffart | |
408 | 935a568c | Florent Chuffart | * number opf reads for each sample |
409 | 935a568c | Florent Chuffart | |
410 | 935a568c | Florent Chuffart | * DESeq analysis results. |
411 | 935a568c | Florent Chuffart | |
412 | 935a568c | Florent Chuffart | For exemple for the file *BY_RM_H3K14ac_wp_snep.tab* informations are: |
413 | 935a568c | Florent Chuffart | * chr_BY, the BY chr involved |
414 | 935a568c | Florent Chuffart | |
415 | 935a568c | Florent Chuffart | * lower_bound_BY, the lower bound of the BY nuc |
416 | 935a568c | Florent Chuffart | |
417 | 935a568c | Florent Chuffart | * upper_bound_BY, the upper_bound of the BY nuc |
418 | 935a568c | Florent Chuffart | |
419 | 8e9facd8 | Florent Chuffart | * index_nuc_BY, the index of the nuc in the entire list of BY |
420 | 8e9facd8 | Florent Chuffart | nucs |
421 | 935a568c | Florent Chuffart | |
422 | 935a568c | Florent Chuffart | * chr_RM, lower_bound_RM, upper_bound_RM, index_nuc_RM |
423 | 8e9facd8 | Florent Chuffart | |
424 | 935a568c | Florent Chuffart | are the same information for the RM strain |
425 | 935a568c | Florent Chuffart | |
426 | 935a568c | Florent Chuffart | * roi_index, the index of the region of interrest involved. |
427 | 935a568c | Florent Chuffart | |
428 | 935a568c | Florent Chuffart | Next cols concern indicators for each sample. They are labeled |
429 | 935a568c | Florent Chuffart | [strain]_[marker]_[sample_id] and each value represents the number of |
430 | 935a568c | Florent Chuffart | reads for the current nuc for the sample *sample_id*. |
431 | 935a568c | Florent Chuffart | |
432 | 935a568c | Florent Chuffart | The 5 final columns concern DESeq analysis: |
433 | 8e9facd8 | Florent Chuffart | * manip[a_manip] strain[a_strain] |
434 | 8e9facd8 | Florent Chuffart | manip[a_strain]:strain[a_strain], the manip (marker) effect, the |
435 | 8e9facd8 | Florent Chuffart | strain effect and the snep effect. |
436 | 935a568c | Florent Chuffart | |
437 | 8e9facd8 | Florent Chuffart | * pvalsGLM, the pvalue resulting of the comparison of the GLM |
438 | 8e9facd8 | Florent Chuffart | model considering or the interaction term *marker:strain* |
439 | 935a568c | Florent Chuffart | |
440 | 935a568c | Florent Chuffart | * snep_index, a boolean set to TRUE if the *pvalueGLM* value is |
441 | 935a568c | Florent Chuffart | under the threshold computed with FDR function with a rate set to |
442 | 935a568c | Florent Chuffart | 0.01%. |
443 | 935a568c | Florent Chuffart | |
444 | 935a568c | Florent Chuffart | It also produces the file that explicts size factor for each involved |
445 | 935a568c | Florent Chuffart | sample in differents strain combination and nucleosomal region type: |
446 | 935a568c | Florent Chuffart | |
447 | 8e9facd8 | Florent Chuffart | TODO: include this file... |
448 | 8e9facd8 | Florent Chuffart | /home/filleton/analyses/snepcatalog/data/2013-10-09/current/README.txt |
449 | 935a568c | Florent Chuffart | |
450 | 8e9facd8 | Florent Chuffart | results/current/size_factors.tab |
451 | 935a568c | Florent Chuffart | |
452 | 935a568c | Florent Chuffart | |
453 | 935a568c | Florent Chuffart | Number of SNEPs |
454 | 935a568c | Florent Chuffart | --------------- |
455 | 935a568c | Florent Chuffart | |
456 | 935a568c | Florent Chuffart | Here are the number of computed for each forms. |
457 | 935a568c | Florent Chuffart | |
458 | 935a568c | Florent Chuffart | [1] "wp" |
459 | 935a568c | Florent Chuffart | #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
460 | 935a568c | Florent Chuffart | BY-RM 30234 520 798 83 3566 26 |
461 | 935a568c | Florent Chuffart | BY-YJM 31298 303 619 102 103 128 |
462 | 935a568c | Florent Chuffart | RM-YJM 29863 129 340 46 3177 18 |
463 | 935a568c | Florent Chuffart | [1] "fuzzy" |
464 | 935a568c | Florent Chuffart | #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
465 | 935a568c | Florent Chuffart | BY-RM 10748 294 308 101 1681 42 |
466 | 935a568c | Florent Chuffart | BY-YJM 10669 122 176 124 93 87 |
467 | 935a568c | Florent Chuffart | RM-YJM 11478 54 112 41 1389 20 |
468 | 935a568c | Florent Chuffart | [1] "wpfuzzy" |
469 | 935a568c | Florent Chuffart | #nucs H3K4me1 H3K4me3 H3K9ac H3K14ac H4K12ac |
470 | 935a568c | Florent Chuffart | BY-RM 40982 770 1136 183 5404 73 |
471 | 935a568c | Florent Chuffart | BY-YJM 41967 439 804 214 198 199 |
472 | 935a568c | Florent Chuffart | RM-YJM 41341 184 468 87 4687 37 |
473 | 935a568c | Florent Chuffart | |
474 | 935a568c | Florent Chuffart | TODO: |
475 | 935a568c | Florent Chuffart | * Print/study intra/inter strain LODs. |
476 | 935a568c | Florent Chuffart | |
477 | 8e9facd8 | Florent Chuffart | * Check the normality of sample using Shapiro–Wilk (Hypothesis |
478 | 8e9facd8 | Florent Chuffart | for computing LODs) |