root / doc / sphinx_doc / tuto.rst @ master
Historique | Voir | Annoter | Télécharger (25,93 ko)
1 | 935a568c | Florent Chuffart | Tutorial |
---|---|---|---|
2 | 935a568c | Florent Chuffart | ======== |
3 | 935a568c | Florent Chuffart | |
4 | 3961deb6 | Florent Chuffart | This tutorial describes steps allowing to perform quantitative analysis of epigenetic marks on individual nucleosomes. We assume that files are organised according to a given hierarchy and that all command lines are launched from the project’s root directory. |
5 | 935a568c | Florent Chuffart | |
6 | 3961deb6 | Florent Chuffart | This tutorial is divided into two main parts. The first part covers the python script `wf.py` that aligns and converts short sequence reads. The second part covers the R scripts that extracts nucleosome-level information (nucleosome position and indicators) from the dataset. |
7 | 935a568c | Florent Chuffart | |
8 | 935a568c | Florent Chuffart | |
9 | dadb6a4d | Florent Chuffart | |
10 | dadb6a4d | Florent Chuffart | |
11 | e5603c3f | Florent Chuffart | Experimental Dataset, Working Directory and Configuration File |
12 | e5603c3f | Florent Chuffart | -------------------------------------------------------------- |
13 | dadb6a4d | Florent Chuffart | |
14 | e5603c3f | Florent Chuffart | Working Directory Organisation |
15 | e5603c3f | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
16 | dadb6a4d | Florent Chuffart | |
17 | 3961deb6 | Florent Chuffart | After having installed NucleoMiner2 environment (Previous section), go to the root working directory of the tutorial by typing the following command in a terminal: |
18 | dadb6a4d | Florent Chuffart | |
19 | 5badc2fd | Florent Chuffart | .. code:: bash |
20 | dadb6a4d | Florent Chuffart | |
21 | 5badc2fd | Florent Chuffart | cd doc/Chuffart_NM2_workdir/ |
22 | dadb6a4d | Florent Chuffart | |
23 | dadb6a4d | Florent Chuffart | |
24 | e5603c3f | Florent Chuffart | Retrieving Experimental Dataset |
25 | e5603c3f | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
26 | 935a568c | Florent Chuffart | |
27 | e5603c3f | Florent Chuffart | The MNase-seq and MN-ChIP-seq raw data are available at ArrayExpress (http://www.ebi.ac.uk/arrayexpress/) under accession number E-MTAB-2671. |
28 | 935a568c | Florent Chuffart | |
29 | e5603c3f | Florent Chuffart | $$$ TODO explain how organise Experimental Dataset into the `data` directory of the working directory. |
30 | 935a568c | Florent Chuffart | |
31 | 935a568c | Florent Chuffart | |
32 | 3961deb6 | Florent Chuffart | In this tutorial, we want to compare nucleosomes of 2 yeast strains: BY and RM. For each strain Mnase-Seq was performed as well as ChIP-Seq using an antibody recognizing the H3K14ac epigenetic mark. Illumina sequencing was done in single-read of 50 bp long. |
33 | 935a568c | Florent Chuffart | |
34 | e5603c3f | Florent Chuffart | The dataset is composed of 55 files organised as follows: |
35 | 935a568c | Florent Chuffart | |
36 | e5603c3f | Florent Chuffart | - 3 replicates for BY MNase Seq |
37 | e5603c3f | Florent Chuffart | |
38 | e5603c3f | Florent Chuffart | - sample 1 (5 fastq.gz files) |
39 | e5603c3f | Florent Chuffart | - sample 2 (5 fastq.gz files) |
40 | e5603c3f | Florent Chuffart | - sample 3 (4 fastq.gz files) |
41 | e5603c3f | Florent Chuffart | |
42 | e5603c3f | Florent Chuffart | - 3 replicates for RM MNase Seq |
43 | e5603c3f | Florent Chuffart | |
44 | e5603c3f | Florent Chuffart | - sample 4 (4 fastq.gz files) |
45 | e5603c3f | Florent Chuffart | - sample 5 (4 fastq.gz files) |
46 | e5603c3f | Florent Chuffart | - sample 6 (5 fastq.gz files) |
47 | e5603c3f | Florent Chuffart | |
48 | e5603c3f | Florent Chuffart | - 3 replicates for BY ChIP Seq H3K14ac |
49 | e5603c3f | Florent Chuffart | |
50 | e5603c3f | Florent Chuffart | - sample 36 (5 fastq.gz files) |
51 | e5603c3f | Florent Chuffart | - sample 37 (5 fastq.gz files) |
52 | e5603c3f | Florent Chuffart | - sample 53 (9 fastq.gz files) |
53 | e5603c3f | Florent Chuffart | |
54 | e5603c3f | Florent Chuffart | - 2 replicates for RM ChIP Seq H3K14ac |
55 | e5603c3f | Florent Chuffart | |
56 | e5603c3f | Florent Chuffart | - sample 38 (5 fastq.gz files) |
57 | e5603c3f | Florent Chuffart | - sample 39 (4 fastq.gz files) |
58 | e5603c3f | Florent Chuffart | |
59 | 935a568c | Florent Chuffart | |
60 | 935a568c | Florent Chuffart | |
61 | 935a568c | Florent Chuffart | |
62 | e5603c3f | Florent Chuffart | Python and R Common Configuration File |
63 | e5603c3f | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
64 | 935a568c | Florent Chuffart | |
65 | 3961deb6 | Florent Chuffart | First, we need to define useful configuration variables that will be passed to python and R scripts. These variables are contained in file `configurator.py`. The execution of this python script dumps variables into the `nucleominer_config.json` file that will then be used by both R and python scripts. |
66 | 935a568c | Florent Chuffart | |
67 | 3961deb6 | Florent Chuffart | The initialization of this variables is done in the configurator.py file. If you need to adapt variable values (path, default parameters...) you need to edit this file. Then, go to the root directory of your project and run the following command to dump the configuration file: |
68 | 935a568c | Florent Chuffart | |
69 | e5603c3f | Florent Chuffart | .. code:: bash |
70 | 935a568c | Florent Chuffart | |
71 | e5603c3f | Florent Chuffart | python src/current/configurator.py |
72 | e5603c3f | Florent Chuffart | |
73 | 935a568c | Florent Chuffart | |
74 | 935a568c | Florent Chuffart | |
75 | 935a568c | Florent Chuffart | |
76 | 935a568c | Florent Chuffart | |
77 | 935a568c | Florent Chuffart | Preprocessing Illumina Fastq Reads for Each Sample |
78 | 935a568c | Florent Chuffart | -------------------------------------------------- |
79 | 935a568c | Florent Chuffart | |
80 | 3961deb6 | Florent Chuffart | Once variables and design have been specified, the script wf.py will automatically run all the analysis. You don't need to do anything. |
81 | 3961deb6 | Florent Chuffart | To run the full analysis, run the following command: |
82 | 3961deb6 | Florent Chuffart | |
83 | 3961deb6 | Florent Chuffart | .. code:: bash |
84 | 3961deb6 | Florent Chuffart | |
85 | 3961deb6 | Florent Chuffart | python src/current/wf.py |
86 | 3961deb6 | Florent Chuffart | |
87 | 3961deb6 | Florent Chuffart | The details of the steps performed by this script are explained below. |
88 | 3961deb6 | Florent Chuffart | This preprocessing consists of 4 steps embedded in the `wf.py` script. They are described bellow. As a preamble, this script computes `samples`, `samples_mnase` and `strains` that will be used along the 4 steps. |
89 | e5603c3f | Florent Chuffart | |
90 | 935a568c | Florent Chuffart | |
91 | 935a568c | Florent Chuffart | .. autodata:: wf.samples |
92 | 935a568c | Florent Chuffart | :noindex: |
93 | 935a568c | Florent Chuffart | |
94 | 935a568c | Florent Chuffart | .. autodata:: wf.samples_mnase |
95 | 935a568c | Florent Chuffart | :noindex: |
96 | 935a568c | Florent Chuffart | |
97 | 935a568c | Florent Chuffart | .. autodata:: wf.strains |
98 | 935a568c | Florent Chuffart | :noindex: |
99 | 935a568c | Florent Chuffart | |
100 | 935a568c | Florent Chuffart | |
101 | 935a568c | Florent Chuffart | Creating Bowtie Index from each Reference Genome |
102 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
103 | 935a568c | Florent Chuffart | |
104 | 3961deb6 | Florent Chuffart | For each strain, the script *wf.py* then creates bowtie index. Bowtie index of a strain is a tree view of the genome of this strain. It will be used by bowtie to align reads. The part of the script performing this is the following: |
105 | 935a568c | Florent Chuffart | |
106 | 8e9facd8 | Florent Chuffart | .. literalinclude:: ../../../snep/src/current/wf.py |
107 | 935a568c | Florent Chuffart | :start-after: # _STARTOF_ step_1 |
108 | 935a568c | Florent Chuffart | :end-before: # _ENDOF_ step_1 |
109 | 935a568c | Florent Chuffart | :language: python |
110 | 935a568c | Florent Chuffart | |
111 | 3961deb6 | Florent Chuffart | As an indication, the following table summarizes the file sizes and process durations that we experienced when running this step on a Linux server***. |
112 | 935a568c | Florent Chuffart | |
113 | 935a568c | Florent Chuffart | ====== ====================== ====================== ================ |
114 | 935a568c | Florent Chuffart | strain fasta genome file size bowtie index file size process duration |
115 | 935a568c | Florent Chuffart | ====== ====================== ====================== ================ |
116 | 935a568c | Florent Chuffart | BY 12 Mo 25 Mo 11 s. |
117 | 935a568c | Florent Chuffart | RM 12 Mo 24 Mo 9 s. |
118 | 935a568c | Florent Chuffart | ====== ====================== ====================== ================ |
119 | 935a568c | Florent Chuffart | |
120 | 935a568c | Florent Chuffart | |
121 | 935a568c | Florent Chuffart | |
122 | 935a568c | Florent Chuffart | |
123 | 935a568c | Florent Chuffart | Aligning Reads to Reference Genome |
124 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
125 | 935a568c | Florent Chuffart | |
126 | 3961deb6 | Florent Chuffart | Next, the *wf.py* script launches bowtie to align reads to the reference genome. It produces a `.sam` file that is converted into a `.bed` file. Binaries for `bowtie`, `samtools` and `bedtools` are wrapped using python `subprocess` class. This step is performed by the following part of the script: |
127 | 935a568c | Florent Chuffart | |
128 | 8e9facd8 | Florent Chuffart | .. literalinclude:: ../../../snep/src/current/wf.py |
129 | 935a568c | Florent Chuffart | :start-after: # _STARTOF_ step_2 |
130 | 935a568c | Florent Chuffart | :end-before: # _ENDOF_ step_2 |
131 | 935a568c | Florent Chuffart | :language: python |
132 | 935a568c | Florent Chuffart | |
133 | e5603c3f | Florent Chuffart | Convert Aligned Reads into TemplateFilter Format |
134 | e5603c3f | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
135 | e5603c3f | Florent Chuffart | |
136 | 3961deb6 | Florent Chuffart | TemplateFilter uses particular input formats for reads, so it is necessary to convert the `.bed` files. TemplateFilter expect reads in the following format: `chr`, `coord`, `strand` and `#read` where: |
137 | 935a568c | Florent Chuffart | |
138 | e5603c3f | Florent Chuffart | - `chr` is the number of the chromosome; |
139 | e5603c3f | Florent Chuffart | - `coord` is the coordinate of the reads; |
140 | e5603c3f | Florent Chuffart | - `strand` is `F` for forward and `R` for reverse; |
141 | e5603c3f | Florent Chuffart | - `#reads` the number of reads covering this position. |
142 | 935a568c | Florent Chuffart | |
143 | 935a568c | Florent Chuffart | Each entry is *tab*-separated. |
144 | 935a568c | Florent Chuffart | |
145 | 3961deb6 | Florent Chuffart | **WARNING** for reverse strands, bowtie returns the position of the first nucleotide on the left hand side, whereas TemplateFilter expects the first one on the right hand side. This is taken into account in NucleoMiner2 by adding the read length (in our case 50) to the reverse reads coordinates. |
146 | 935a568c | Florent Chuffart | |
147 | 3961deb6 | Florent Chuffart | This step is performed by the following part of the *wf.py* script: |
148 | 935a568c | Florent Chuffart | |
149 | 8e9facd8 | Florent Chuffart | .. literalinclude:: ../../../snep/src/current/wf.py |
150 | 935a568c | Florent Chuffart | :start-after: # _STARTOF_ step_3 |
151 | 935a568c | Florent Chuffart | :end-before: # _ENDOF_ step_3 |
152 | 935a568c | Florent Chuffart | :language: python |
153 | 935a568c | Florent Chuffart | |
154 | 3961deb6 | Florent Chuffart | The following table summarizes the number of reads, the involved file sizes and process durations that we experienced when running the two last steps. In our case, alignment process were multithreaded over 3 cores. |
155 | 935a568c | Florent Chuffart | |
156 | 935a568c | Florent Chuffart | == ============== ========================= ====== ================ ================== ================ |
157 | 935a568c | Florent Chuffart | id Illumina reads aligned and filtred reads ratio `.bed` file size TF input file size process duration |
158 | 935a568c | Florent Chuffart | == ============== ========================= ====== ================ ================== ================ |
159 | 935a568c | Florent Chuffart | 1 16436138 10199695 62,06% 1064 Mo 60 Mo 383 s. |
160 | 935a568c | Florent Chuffart | 2 16911132 12512727 73,99% 1298 Mo 64 Mo 437 s. |
161 | 935a568c | Florent Chuffart | 3 15946902 12340426 77,38% 1280 Mo 65 Mo 423 s. |
162 | 935a568c | Florent Chuffart | 4 13765584 10381903 75,42% 931 Mo 59 Mo 352 s. |
163 | 935a568c | Florent Chuffart | 5 15168268 11502855 75,83% 1031 Mo 64 Mo 386 s. |
164 | 935a568c | Florent Chuffart | 6 18850820 14024905 74,40% 1254 Mo 69 Mo 482 s. |
165 | 935a568c | Florent Chuffart | 36 17715118 14092985 79,55% 1404 Mo 68 Mo 483 s. |
166 | 935a568c | Florent Chuffart | 37 17288466 7402082 42,82% 741 Mo 48 Mo 339 s. |
167 | 935a568c | Florent Chuffart | 38 16116394 13178457 81,77% 1101 Mo 63 Mo 420 s. |
168 | 935a568c | Florent Chuffart | 39 14241106 10537228 73,99% 880 Mo 57 Mo 348 s. |
169 | 935a568c | Florent Chuffart | 53 40876476 33780065 82,64% 3316 Mo 103 Mo 1165 s. |
170 | 935a568c | Florent Chuffart | == ============== ========================= ====== ================ ================== ================ |
171 | 935a568c | Florent Chuffart | |
172 | 935a568c | Florent Chuffart | Run TemplateFilter on Mnase Samples |
173 | dadb6a4d | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
174 | 935a568c | Florent Chuffart | |
175 | e5603c3f | Florent Chuffart | Finally, for each sample we perform TemplateFilter analysis. |
176 | 935a568c | Florent Chuffart | |
177 | 935a568c | Florent Chuffart | **WARNING** TemplateFilter returns a list of nucleosomes. Each nucleosome is |
178 | 3961deb6 | Florent Chuffart | defined by its center and its width. An odd width leads us to consider non- |
179 | e5603c3f | Florent Chuffart | integer lower and upper bound. |
180 | 935a568c | Florent Chuffart | |
181 | 3961deb6 | Florent Chuffart | **WARNING** TemplateFilter was not designed to handle replicates. So we recommend to keep a maximum of nucleosomes and filter the aberrant ones afterwards using the benefits of having replicates. To do this, we set a low correlation threshold parameter (0.5) and a particularly high value of overlap (300%). |
182 | 935a568c | Florent Chuffart | |
183 | e5603c3f | Florent Chuffart | This step is performed by the following part of the `wf.py` script: |
184 | 935a568c | Florent Chuffart | |
185 | 8e9facd8 | Florent Chuffart | .. literalinclude:: ../../../snep/src/current/wf.py |
186 | 935a568c | Florent Chuffart | :start-after: # _STARTOF_ step_4 |
187 | 935a568c | Florent Chuffart | :end-before: # _ENDOF_ step_4 |
188 | 935a568c | Florent Chuffart | :language: python |
189 | 935a568c | Florent Chuffart | |
190 | 935a568c | Florent Chuffart | == ====== ========== ============= ================ |
191 | 935a568c | Florent Chuffart | id strain found nucs nuc file size process duration |
192 | 935a568c | Florent Chuffart | == ====== ========== ============= ================ |
193 | 935a568c | Florent Chuffart | 1 BY 96214 68 Mo 1022 s. |
194 | 935a568c | Florent Chuffart | 2 BY 91694 65 Mo 1038 s. |
195 | 935a568c | Florent Chuffart | 3 BY 91205 65 Mo 1036 s. |
196 | 935a568c | Florent Chuffart | 4 RM 88076 62 Mo 984 s. |
197 | 935a568c | Florent Chuffart | 5 RM 90141 64 Mo 967 s. |
198 | 935a568c | Florent Chuffart | 6 RM 87517 62 Mo 980 s. |
199 | 935a568c | Florent Chuffart | == ====== ========== ============= ================ |
200 | 935a568c | Florent Chuffart | |
201 | 935a568c | Florent Chuffart | |
202 | 935a568c | Florent Chuffart | |
203 | 935a568c | Florent Chuffart | |
204 | 935a568c | Florent Chuffart | |
205 | 935a568c | Florent Chuffart | |
206 | 935a568c | Florent Chuffart | |
207 | 935a568c | Florent Chuffart | |
208 | 935a568c | Florent Chuffart | |
209 | 935a568c | Florent Chuffart | |
210 | 935a568c | Florent Chuffart | |
211 | 935a568c | Florent Chuffart | |
212 | 935a568c | Florent Chuffart | |
213 | e5603c3f | Florent Chuffart | |
214 | e5603c3f | Florent Chuffart | |
215 | e5603c3f | Florent Chuffart | |
216 | e5603c3f | Florent Chuffart | |
217 | e5603c3f | Florent Chuffart | |
218 | e5603c3f | Florent Chuffart | |
219 | e5603c3f | Florent Chuffart | |
220 | e5603c3f | Florent Chuffart | |
221 | e5603c3f | Florent Chuffart | |
222 | e5603c3f | Florent Chuffart | |
223 | e5603c3f | Florent Chuffart | |
224 | e5603c3f | Florent Chuffart | |
225 | e5603c3f | Florent Chuffart | |
226 | e5603c3f | Florent Chuffart | |
227 | e5603c3f | Florent Chuffart | |
228 | e5603c3f | Florent Chuffart | |
229 | e5603c3f | Florent Chuffart | |
230 | e5603c3f | Florent Chuffart | |
231 | e5603c3f | Florent Chuffart | |
232 | e5603c3f | Florent Chuffart | |
233 | e5603c3f | Florent Chuffart | |
234 | e5603c3f | Florent Chuffart | |
235 | e5603c3f | Florent Chuffart | |
236 | e5603c3f | Florent Chuffart | |
237 | e5603c3f | Florent Chuffart | |
238 | e5603c3f | Florent Chuffart | |
239 | e5603c3f | Florent Chuffart | |
240 | e5603c3f | Florent Chuffart | |
241 | e5603c3f | Florent Chuffart | |
242 | e5603c3f | Florent Chuffart | |
243 | e5603c3f | Florent Chuffart | |
244 | e5603c3f | Florent Chuffart | |
245 | e5603c3f | Florent Chuffart | |
246 | e5603c3f | Florent Chuffart | |
247 | e5603c3f | Florent Chuffart | |
248 | e5603c3f | Florent Chuffart | |
249 | e5603c3f | Florent Chuffart | |
250 | e5603c3f | Florent Chuffart | |
251 | e5603c3f | Florent Chuffart | |
252 | e5603c3f | Florent Chuffart | |
253 | e5603c3f | Florent Chuffart | |
254 | e5603c3f | Florent Chuffart | .. |
255 | e5603c3f | Florent Chuffart | .. |
256 | e5603c3f | Florent Chuffart | .. - libcoverage.py |
257 | e5603c3f | Florent Chuffart | .. - wf.py |
258 | e5603c3f | Florent Chuffart | .. |
259 | e5603c3f | Florent Chuffart | .. |
260 | e5603c3f | Florent Chuffart | .. |
261 | e5603c3f | Florent Chuffart | .. |
262 | e5603c3f | Florent Chuffart | .. |
263 | e5603c3f | Florent Chuffart | .. |
264 | e5603c3f | Florent Chuffart | .. In order to simplify the design of experiment, we consider Mnase as a marker. |
265 | e5603c3f | Florent Chuffart | .. For each couple `(strain, marker)` we perform 3 replicates. So, theoritically |
266 | e5603c3f | Florent Chuffart | .. we should have `3 * (1 + 5) * 3 = 54` samples. In practice we only obtain 2 |
267 | e5603c3f | Florent Chuffart | .. replicates for `(YJM, H3K4me1)`. Each one of the 53 samples is indentify by a |
268 | e5603c3f | Florent Chuffart | .. uniq identifier. The file `CSV_SAMPLE_FILE` sums up this information. |
269 | e5603c3f | Florent Chuffart | .. |
270 | e5603c3f | Florent Chuffart | .. .. autodata:: configurator.CSV_SAMPLE_FILE |
271 | e5603c3f | Florent Chuffart | .. :noindex: |
272 | e5603c3f | Florent Chuffart | .. |
273 | e5603c3f | Florent Chuffart | .. We use a convention to link sample and Illumina fastq outputs. Illumina output |
274 | e5603c3f | Florent Chuffart | .. files of the sample `ID` will be stored in the directory |
275 | e5603c3f | Florent Chuffart | .. `ILLUMINA_OUTPUTFILE_PREFIX` + `ID`. For example, sample 41 outputs will be |
276 | e5603c3f | Florent Chuffart | .. stored in the directory `data/2012-09-05/FASTQ/Sample_Yvert_Bq41/`. |
277 | e5603c3f | Florent Chuffart | .. |
278 | e5603c3f | Florent Chuffart | .. .. autodata:: configurator.ILLUMINA_OUTPUTFILE_PREFIX |
279 | e5603c3f | Florent Chuffart | .. :noindex: |
280 | e5603c3f | Florent Chuffart | .. |
281 | e5603c3f | Florent Chuffart | .. For BY (resp. RM and YJM) we use following reference genome |
282 | e5603c3f | Florent Chuffart | .. `saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta` |
283 | e5603c3f | Florent Chuffart | .. (resp. `saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta` and |
284 | e5603c3f | Florent Chuffart | .. `saccharomyces_cerevisiae_YJM_789_screencontig.fasta`). |
285 | e5603c3f | Florent Chuffart | .. The index `FASTA_REFERENCE_GENOME_FILES` stores this information. |
286 | e5603c3f | Florent Chuffart | .. |
287 | e5603c3f | Florent Chuffart | .. .. autodata:: configurator.FASTA_REFERENCE_GENOME_FILES |
288 | e5603c3f | Florent Chuffart | .. :noindex: |
289 | e5603c3f | Florent Chuffart | .. |
290 | e5603c3f | Florent Chuffart | .. Each chromosome/contig is identify in the fasta file by an obscure identifier. |
291 | e5603c3f | Florent Chuffart | .. For example, BY chromosome I is identify by `gi|144228165|ref|NC_001133.7|` when |
292 | e5603c3f | Florent Chuffart | .. TemplateFilter is waiting for an integer. So, we translate it. The index |
293 | e5603c3f | Florent Chuffart | .. `FASTA_INDEXES` stores this translation. |
294 | e5603c3f | Florent Chuffart | .. |
295 | e5603c3f | Florent Chuffart | .. .. autodata:: configurator.FASTA_INDEXES |
296 | e5603c3f | Florent Chuffart | .. :noindex: |
297 | e5603c3f | Florent Chuffart | .. |
298 | e5603c3f | Florent Chuffart | .. From a pragamatical point of view we discard some part of the genome (repeated |
299 | e5603c3f | Florent Chuffart | .. sequence etc...). The list of the black listed area is explicitely detailled in |
300 | e5603c3f | Florent Chuffart | .. `AREA_BLACK_LIST`. |
301 | e5603c3f | Florent Chuffart | .. |
302 | e5603c3f | Florent Chuffart | .. .. autodata:: configurator.AREA_BLACK_LIST |
303 | e5603c3f | Florent Chuffart | .. :noindex: |
304 | e5603c3f | Florent Chuffart | .. |
305 | e5603c3f | Florent Chuffart | .. For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use previously |
306 | e5603c3f | Florent Chuffart | .. compute .c2c file `data/2012-03_primarydata/BY_RM_gxcomp.c2c` (resp. |
307 | e5603c3f | Florent Chuffart | .. `BY_YJM_GComp_All.c2c` and `RM_YJM_gxcomp.c2c`). For more information about |
308 | e5603c3f | Florent Chuffart | .. .c2c files, please read section 5 of the manual of `NucleoMiner`, the old |
309 | e5603c3f | Florent Chuffart | .. version of `NucleoMiner2` |
310 | e5603c3f | Florent Chuffart | .. (http://www.ens-lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf). |
311 | e5603c3f | Florent Chuffart | .. |
312 | e5603c3f | Florent Chuffart | .. .. autodata:: configurator.C2C_FILES |
313 | e5603c3f | Florent Chuffart | .. :noindex: |
314 | e5603c3f | Florent Chuffart | .. |
315 | e5603c3f | Florent Chuffart | .. `nucleominer` uses specific directory to work in, these are described in |
316 | e5603c3f | Florent Chuffart | .. `INDEX_DIR`, `ALIGN_DIR` and `LOG_DIR`. |
317 | e5603c3f | Florent Chuffart | .. |
318 | e5603c3f | Florent Chuffart | .. Finally, `nucleominer` use external ressources, the path to these resspources |
319 | e5603c3f | Florent Chuffart | .. are describe in `BOWTIE_BUILD_BIN`, `BOWTIE2_BIN`, `SAMTOOLS_BIN`, |
320 | e5603c3f | Florent Chuffart | .. `BEDTOOLS_BIN` and `TF_BIN` and `TF_TEMPLATES_FILE`. |
321 | e5603c3f | Florent Chuffart | .. |
322 | e5603c3f | Florent Chuffart | .. All paths, prefixes and indexes could be change in the |
323 | e5603c3f | Florent Chuffart | .. `src/current/nucleominer_config.json` file. |
324 | e5603c3f | Florent Chuffart | .. |
325 | e5603c3f | Florent Chuffart | .. .. autodata:: wf.json_conf_file |
326 | e5603c3f | Florent Chuffart | .. :noindex: |
327 | e5603c3f | Florent Chuffart | .. |
328 | e5603c3f | Florent Chuffart | |
329 | e5603c3f | Florent Chuffart | |
330 | e5603c3f | Florent Chuffart | |
331 | e5603c3f | Florent Chuffart | |
332 | e5603c3f | Florent Chuffart | |
333 | e5603c3f | Florent Chuffart | |
334 | e5603c3f | Florent Chuffart | |
335 | e5603c3f | Florent Chuffart | |
336 | e5603c3f | Florent Chuffart | |
337 | e5603c3f | Florent Chuffart | |
338 | e5603c3f | Florent Chuffart | |
339 | e5603c3f | Florent Chuffart | |
340 | e5603c3f | Florent Chuffart | |
341 | e5603c3f | Florent Chuffart | |
342 | e5603c3f | Florent Chuffart | |
343 | e5603c3f | Florent Chuffart | |
344 | e5603c3f | Florent Chuffart | |
345 | e5603c3f | Florent Chuffart | |
346 | e5603c3f | Florent Chuffart | |
347 | e5603c3f | Florent Chuffart | |
348 | e5603c3f | Florent Chuffart | |
349 | e5603c3f | Florent Chuffart | |
350 | e5603c3f | Florent Chuffart | |
351 | e5603c3f | Florent Chuffart | |
352 | e5603c3f | Florent Chuffart | |
353 | e5603c3f | Florent Chuffart | |
354 | e5603c3f | Florent Chuffart | |
355 | e5603c3f | Florent Chuffart | |
356 | e5603c3f | Florent Chuffart | |
357 | e5603c3f | Florent Chuffart | |
358 | e5603c3f | Florent Chuffart | |
359 | e5603c3f | Florent Chuffart | |
360 | e5603c3f | Florent Chuffart | |
361 | e5603c3f | Florent Chuffart | |
362 | e5603c3f | Florent Chuffart | |
363 | e5603c3f | Florent Chuffart | |
364 | e5603c3f | Florent Chuffart | |
365 | e5603c3f | Florent Chuffart | |
366 | e5603c3f | Florent Chuffart | |
367 | e5603c3f | Florent Chuffart | |
368 | e5603c3f | Florent Chuffart | |
369 | e5603c3f | Florent Chuffart | |
370 | 935a568c | Florent Chuffart | Inferring Nucleosome Position and Extracting Read Counts |
371 | 935a568c | Florent Chuffart | -------------------------------------------------------- |
372 | 935a568c | Florent Chuffart | |
373 | 935a568c | Florent Chuffart | |
374 | 935a568c | Florent Chuffart | |
375 | 3961deb6 | Florent Chuffart | The second part of the tutorial uses R (http://http://www.r-project.org). NucleoMiner2 contains a set of R scripts that will be sourced in R from a console launched at the root of your project. These scripts are: |
376 | 935a568c | Florent Chuffart | |
377 | dadb6a4d | Florent Chuffart | - headers.R |
378 | 935a568c | Florent Chuffart | - extract_maps.R |
379 | e5603c3f | Florent Chuffart | - translate_common_wp.R |
380 | b20637ed | Florent Chuffart | - split_samples.R |
381 | 935a568c | Florent Chuffart | - count_reads.R |
382 | 935a568c | Florent Chuffart | - get_size_factors |
383 | 935a568c | Florent Chuffart | - launch_deseq.R |
384 | 935a568c | Florent Chuffart | |
385 | dadb6a4d | Florent Chuffart | The Script headers.R |
386 | dadb6a4d | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^ |
387 | dadb6a4d | Florent Chuffart | |
388 | 3961deb6 | Florent Chuffart | The script headers.R is included in all other R scripts. It is in charge of: |
389 | dadb6a4d | Florent Chuffart | |
390 | e5603c3f | Florent Chuffart | - launching libraries used in the scripts |
391 | dadb6a4d | Florent Chuffart | - launching configuration (design, strain, marker...) |
392 | 3961deb6 | Florent Chuffart | - computing and caching Common Uinterrupted Regions (CURs). Caching means storing the information in the computer's memory. |
393 | e5603c3f | Florent Chuffart | |
394 | 3961deb6 | Florent Chuffart | Note that you can customize the function “translate”. This function allows you to use the alignments between genomes when performing various tasks. |
395 | e5603c3f | Florent Chuffart | |
396 | 3961deb6 | Florent Chuffart | - You may want to analyze data of a single strain (e.g. treatment/control, or only few mutations). In this case, the genome is identical across all samples and you do not need to define particular CURs (CURs are chromosomes). Simply use the default translate function which is neutral. |
397 | e5603c3f | Florent Chuffart | |
398 | 3961deb6 | Florent Chuffart | - If you are analyzing data from two or more strains (as NucleoMiner2 was designed for), then you need to translate coordinates of one genome into the coordinates of another one. You must do this by aligning the two genomes, which will produce a .c2c file (see Appendice "Generate .c2c Files"). thenuse it to produce the list of regions and customise “translate”. |
399 | e5603c3f | Florent Chuffart | |
400 | 3961deb6 | Florent Chuffart | In our tutorial, we are in the second case and to perform all these steps run the following command line in your R console: |
401 | 935a568c | Florent Chuffart | |
402 | 935a568c | Florent Chuffart | .. code:: bash |
403 | 935a568c | Florent Chuffart | |
404 | e5603c3f | Florent Chuffart | source("src/current/headers.R") |
405 | 935a568c | Florent Chuffart | |
406 | 935a568c | Florent Chuffart | |
407 | dadb6a4d | Florent Chuffart | The Script extract_maps.R |
408 | dadb6a4d | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
409 | 3961deb6 | Florent Chuffart | This script is in charge of extracting Maps for well-positioned and sensitive nucleosomes. First of all, this script computes intra and inter-strain matches of nucleosome maps for each CUR. This step can be executed in parallel on many cores using the BoT library. Next, it collects results and produces maps of well-positioned nucleosomes, sensitive nucleosomes and Unaligned Nucleosomal Regions . |
410 | dadb6a4d | Florent Chuffart | |
411 | 3961deb6 | Florent Chuffart | The map of well-positioned nucleosomes for BY is collected in the result directory and is called `BY_wp.tab`. It is composed of following columns: |
412 | dadb6a4d | Florent Chuffart | |
413 | dadb6a4d | Florent Chuffart | - chr, the number of the chromosome |
414 | dadb6a4d | Florent Chuffart | - lower_bound, the lower bound of the nucleosome |
415 | dadb6a4d | Florent Chuffart | - upper_bound, the upper bound of the nucleosome |
416 | dadb6a4d | Florent Chuffart | - cur_index, index of the CUR |
417 | dadb6a4d | Florent Chuffart | - index_nuc, the index of the nucleosome in the CUR |
418 | e5603c3f | Florent Chuffart | - wp, 1 if it is a well positioned nucleosome, 0 otherwise |
419 | e5603c3f | Florent Chuffart | - nb_reads, the number of reads that support this nucleosome |
420 | e5603c3f | Florent Chuffart | - nb_nucs, the number of TemplateFilter nucleosome across replicates (= the number of replicates in which it is a well-positioned nucleosome) |
421 | e5603c3f | Florent Chuffart | - llr_1, for a well-positioned nucleosome, it is the LLR1 (log-likelihood ratio) between the first and the second TemplateFilter nucleosome on the chain. |
422 | e5603c3f | Florent Chuffart | - llr_2, for a well-positioned nucleosome, it is the LLR1 between the second and the third TemplateFilter nucleosome on the chain. |
423 | e5603c3f | Florent Chuffart | - wp_llr, for a well-positioned nucleosome, it is the LLR2 that compares consistency of the positioning over all TemplateFilter nucleosomes. |
424 | 3961deb6 | Florent Chuffart | - wp_pval, for a well-positioned nucleosome, it is the p-value chi square test obtained from LLR2 (`1-pchisq(2.LLR2, df=4)`) |
425 | e5603c3f | Florent Chuffart | - dyad_shift, for a well-positioned nucleosome, it is the shift between the two extreme TemplateFilter nucleosome dyad positions. |
426 | dadb6a4d | Florent Chuffart | |
427 | 3961deb6 | Florent Chuffart | The sensitive map for BY is collected in the result directory and is called `BY_fuzzy.tab`. It is composed of following columns: |
428 | dadb6a4d | Florent Chuffart | |
429 | dadb6a4d | Florent Chuffart | - chr, the number of the chromosome |
430 | dadb6a4d | Florent Chuffart | - lower_bound, the lower bound of the nucleosome |
431 | dadb6a4d | Florent Chuffart | - upper_bound, the upper bound of the nucleosome |
432 | dadb6a4d | Florent Chuffart | - cur_index, index of the CUR |
433 | dadb6a4d | Florent Chuffart | |
434 | e5603c3f | Florent Chuffart | The map of common well-positioned nucleosomes aligned between the BY and RM strains is collected in the result directory and is called `BY_RM_common_wp.tab`. It is composed of following columns: |
435 | dadb6a4d | Florent Chuffart | |
436 | dadb6a4d | Florent Chuffart | - cur_index, the index of the CUR |
437 | dadb6a4d | Florent Chuffart | - index_nuc_BY, the index of the BY nucleosome in the CUR |
438 | e5603c3f | Florent Chuffart | - index_nuc_RM, the index of the RM nucleosome in the CUR |
439 | e5603c3f | Florent Chuffart | - llr_score, , the LLR3 score that estimates conservation between the positions in BY and RM |
440 | e5603c3f | Florent Chuffart | - common_wp_pval, the p-value chi square test obtained from LLR3 (`1-pchisq(2.LLR3, df=2)`) |
441 | 3961deb6 | Florent Chuffart | - diff, the dyads shift between the positions in the two strains (in bp) |
442 | dadb6a4d | Florent Chuffart | |
443 | e5603c3f | Florent Chuffart | The common UNR map for BY and RM strains is collected in the result directory and is called `BY_RM_common_unr.tab`. It is composed of the following columns: |
444 | dadb6a4d | Florent Chuffart | |
445 | dadb6a4d | Florent Chuffart | - cur_index, the index of the CUR |
446 | dadb6a4d | Florent Chuffart | - index_nuc_BY, the index of the BY nucleosome in the CUR |
447 | dadb6a4d | Florent Chuffart | - index_nuc_RM,the index of the RM nucleosome in the CUR |
448 | dadb6a4d | Florent Chuffart | |
449 | e5603c3f | Florent Chuffart | To execute this script, run the following command in your R console: |
450 | 935a568c | Florent Chuffart | |
451 | 935a568c | Florent Chuffart | .. code:: bash |
452 | 935a568c | Florent Chuffart | |
453 | dadb6a4d | Florent Chuffart | source("src/current/extract_maps.R") |
454 | dadb6a4d | Florent Chuffart | |
455 | dadb6a4d | Florent Chuffart | |
456 | e5603c3f | Florent Chuffart | The Script translate_common_wp.R |
457 | e5603c3f | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
458 | dadb6a4d | Florent Chuffart | |
459 | 3961deb6 | Florent Chuffart | This script is used to translate common well-positioned nucleosome positions from a strain to another strain and stores it into a table. |
460 | dadb6a4d | Florent Chuffart | |
461 | 3961deb6 | Florent Chuffart | For example, the file `results/2014-04/RM_wp_tr_2_BY.tab` contains RM well-positioned nucleosomes translated into the BY genome coordinates. It is composed of following columns: |
462 | dadb6a4d | Florent Chuffart | |
463 | dadb6a4d | Florent Chuffart | - strain_ref, the reference genome (in which positioned are defined) |
464 | dadb6a4d | Florent Chuffart | - begin, the translated lower bound of the nucleosome |
465 | dadb6a4d | Florent Chuffart | - end, the translated upper bound of the nucleosome |
466 | e5603c3f | Florent Chuffart | - chr, the number of chromosomes for the reference genome (in which positioned are defined) |
467 | dadb6a4d | Florent Chuffart | - length, the length of the nucleosome (could be negative) |
468 | dadb6a4d | Florent Chuffart | - cur_index, the index of the CUR |
469 | dadb6a4d | Florent Chuffart | - index_nuc, the index of the nucleosome in the CUR |
470 | dadb6a4d | Florent Chuffart | |
471 | e5603c3f | Florent Chuffart | To execute this script, run the following command in your R console: |
472 | 935a568c | Florent Chuffart | |
473 | e5603c3f | Florent Chuffart | .. code:: bash |
474 | 935a568c | Florent Chuffart | |
475 | e5603c3f | Florent Chuffart | source("src/current/translate_common_wp.R") |
476 | b20637ed | Florent Chuffart | |
477 | b20637ed | Florent Chuffart | |
478 | e5603c3f | Florent Chuffart | The Script split_samples.R |
479 | e5603c3f | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
480 | b20637ed | Florent Chuffart | |
481 | 3961deb6 | Florent Chuffart | To optimize memory space usage, we split and compress TemplateFilter input files according to their corresponding chromosome. for example, `sample_1_TF.tab` will be split into : |
482 | b20637ed | Florent Chuffart | |
483 | e5603c3f | Florent Chuffart | - sample_1_chr_1_splited_sample.tab.gz |
484 | e5603c3f | Florent Chuffart | - sample_1_chr_2_splited_sample.tab.gz |
485 | e5603c3f | Florent Chuffart | - ... |
486 | e5603c3f | Florent Chuffart | - sample_1_chr_17_splited_sample.tab.gz |
487 | e5603c3f | Florent Chuffart | |
488 | e5603c3f | Florent Chuffart | |
489 | e5603c3f | Florent Chuffart | To execute this script, run the following command in your R console: |
490 | b20637ed | Florent Chuffart | |
491 | b20637ed | Florent Chuffart | .. code:: bash |
492 | b20637ed | Florent Chuffart | |
493 | e5603c3f | Florent Chuffart | source("src/current/split_samples.R") |
494 | b20637ed | Florent Chuffart | |
495 | b20637ed | Florent Chuffart | |
496 | e5603c3f | Florent Chuffart | The Script count_reads.R |
497 | e5603c3f | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^ |
498 | e5603c3f | Florent Chuffart | |
499 | e5603c3f | Florent Chuffart | To associate a number of observations (read) to each nucleosome we run the script `count_reads.R`. It produces the files `BY_RM_H3K14ac_wp_and_nbreads.tab`, `BY_RM_H3K14ac_unr_and_nbreads.tab` `BY_RM_Mnase_Seq_wp_and_nbreads.tab` and `BY_RM_Mnase_Seq_unr_and_nbreads.tab` |
500 | e5603c3f | Florent Chuffart | for H3K14ac common well-positioned nucleosomes, H3K14ac UNRs, Mnase common well-positioned nucleosomes and Mnase UNRs respectively. |
501 | e5603c3f | Florent Chuffart | |
502 | e5603c3f | Florent Chuffart | For example, the file `BY_RM_H3K14ac_unr_and_nbreads.tab` contains counted reads for well-positioned nucleosomes with the experimental condition ChIP H3K14ac. It is composed of the following columns: |
503 | e5603c3f | Florent Chuffart | |
504 | e5603c3f | Florent Chuffart | - chr_BY, the number of the chromosome for BY |
505 | e5603c3f | Florent Chuffart | - lower_bound_BY, the lower bound of the nucleosome for BY |
506 | e5603c3f | Florent Chuffart | - upper_bound_BY, the upper bound of the nucleosome for BY |
507 | e5603c3f | Florent Chuffart | - index_nuc_BY, the index of the BY nucleosome in the CUR for BY |
508 | e5603c3f | Florent Chuffart | - chr_RM, the number of the chromosome for RM |
509 | e5603c3f | Florent Chuffart | - lower_bound_RM, the lower bound of the nucleosome for RM |
510 | e5603c3f | Florent Chuffart | - upper_bound_RM, the upper bound of the nucleosome for RM |
511 | e5603c3f | Florent Chuffart | - index_nuc_RM,the index of the RM nucleosome in the CUR for RM |
512 | e5603c3f | Florent Chuffart | - cur_index, index of the CUR |
513 | e5603c3f | Florent Chuffart | - BY_H3K14ac_36, the number of reads for the current nucleosome for the sample 36 |
514 | e5603c3f | Florent Chuffart | - BY_H3K14ac_37, #reads for sample 37 |
515 | e5603c3f | Florent Chuffart | - BY_H3K14ac_53, #reads for sample 53 |
516 | e5603c3f | Florent Chuffart | - RM_H3K14ac_38, #reads for sample 38 |
517 | e5603c3f | Florent Chuffart | - RM_H3K14ac_39, #reads for sample 39 |
518 | e5603c3f | Florent Chuffart | |
519 | e5603c3f | Florent Chuffart | To execute this script, run the following command in your R console: |
520 | 935a568c | Florent Chuffart | |
521 | 935a568c | Florent Chuffart | .. code:: bash |
522 | 935a568c | Florent Chuffart | |
523 | e5603c3f | Florent Chuffart | source("src/current/count_reads.R") |
524 | e5603c3f | Florent Chuffart | |
525 | e5603c3f | Florent Chuffart | |
526 | e5603c3f | Florent Chuffart | The Script get_size_factors.R |
527 | e5603c3f | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
528 | e5603c3f | Florent Chuffart | |
529 | e5603c3f | Florent Chuffart | |
530 | e5603c3f | Florent Chuffart | This script uses the DESeq function `estimateSizeFactors` to compute the size factor of each sample. It corresponds to normalisation of read counts from sample to sample, as determined by DESeq. When a sample has n reads for a nucleosome or a UNR, |
531 | e5603c3f | Florent Chuffart | the normalised count is n/f where f is the factor contained in this file. |
532 | e5603c3f | Florent Chuffart | The script dumps computed size factors into the file `size_factors.tab`. This file has the form: |
533 | e5603c3f | Florent Chuffart | |
534 | e5603c3f | Florent Chuffart | ========= ======= ======= ======= |
535 | e5603c3f | Florent Chuffart | sample_id wp unr wpunr |
536 | e5603c3f | Florent Chuffart | ========= ======= ======= ======= |
537 | e5603c3f | Florent Chuffart | 1 0.87396 0.88097 0.87584 |
538 | e5603c3f | Florent Chuffart | 2 1.07890 1.07440 1.07760 |
539 | e5603c3f | Florent Chuffart | 3 1.06400 1.05890 1.06250 |
540 | e5603c3f | Florent Chuffart | 4 0.85782 0.87948 0.86305 |
541 | e5603c3f | Florent Chuffart | 5 0.97577 0.96590 0.97307 |
542 | e5603c3f | Florent Chuffart | 6 1.19630 1.18120 1.19190 |
543 | e5603c3f | Florent Chuffart | 36 0.93318 0.92762 0.93166 |
544 | e5603c3f | Florent Chuffart | 37 0.48315 0.48453 0.48350 |
545 | e5603c3f | Florent Chuffart | 38 1.11240 1.11210 1.11230 |
546 | e5603c3f | Florent Chuffart | 39 0.89897 0.89917 0.89903 |
547 | e5603c3f | Florent Chuffart | 53 2.22650 2.22700 2.22660 |
548 | e5603c3f | Florent Chuffart | ========= ======= ======= ======= |
549 | e5603c3f | Florent Chuffart | |
550 | e5603c3f | Florent Chuffart | sample_id are given in file samples.csv |
551 | 935a568c | Florent Chuffart | |
552 | 3961deb6 | Florent Chuffart | If you don't know which column to use for normalization, we recommend using wpunr. |
553 | 935a568c | Florent Chuffart | |
554 | 3961deb6 | Florent Chuffart | Here are the details of the factors produced: |
555 | e5603c3f | Florent Chuffart | |
556 | e5603c3f | Florent Chuffart | - unr: factor computed from data of UNR regions. These regions are defined for every pairs of aligned genomes (e.g. BY_RM) |
557 | e5603c3f | Florent Chuffart | - wp: same, but for well-positioned nucleosomes. |
558 | e5603c3f | Florent Chuffart | - wpunr: both types of regions. |
559 | e5603c3f | Florent Chuffart | |
560 | e5603c3f | Florent Chuffart | To execute this script, run the following command in your R console: |
561 | 935a568c | Florent Chuffart | |
562 | 935a568c | Florent Chuffart | .. code:: bash |
563 | 935a568c | Florent Chuffart | |
564 | e5603c3f | Florent Chuffart | source("src/current/get_size_factors.R") |
565 | 935a568c | Florent Chuffart | |
566 | 935a568c | Florent Chuffart | |
567 | e5603c3f | Florent Chuffart | The Script launch_deseq.R |
568 | 935a568c | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
569 | 935a568c | Florent Chuffart | |
570 | e5603c3f | Florent Chuffart | Finally, the script `launch_deseq.R` perform statistical analysis on each nucleosome using `DESeq`. It produces files: |
571 | e5603c3f | Florent Chuffart | |
572 | e5603c3f | Florent Chuffart | - results/current/BY_RM_H3K14ac_wp_snep.tab |
573 | e5603c3f | Florent Chuffart | - results/current/BY_RM_H3K14ac_unr_snep.tab |
574 | e5603c3f | Florent Chuffart | - results/current/BY_RM_H3K14ac_wpunr_snep.tab |
575 | e5603c3f | Florent Chuffart | - results/current/BY_RM_H3K14ac_wp_mnase.tab |
576 | e5603c3f | Florent Chuffart | - results/current/BY_RM_H3K14ac_unr_mnase.tab |
577 | e5603c3f | Florent Chuffart | - results/current/BY_RM_H3K14ac_wpunr_mnase.tab |
578 | e5603c3f | Florent Chuffart | |
579 | e5603c3f | Florent Chuffart | These files are organised with the following columns (see file `BY_RM_H3K14ac_wp_snep.tab` for an example): |
580 | e5603c3f | Florent Chuffart | |
581 | e5603c3f | Florent Chuffart | - chr_BY, the number of the chromosome for BY |
582 | e5603c3f | Florent Chuffart | - lower_bound_BY, the lower bound of the nucleosome for BY |
583 | e5603c3f | Florent Chuffart | - upper_bound_BY, the upper bound of the nucleosome for BY |
584 | e5603c3f | Florent Chuffart | - index_nuc_BY, the index of the BY nucleosome in the CUR for BY |
585 | e5603c3f | Florent Chuffart | - chr_RM, the number of the chromosome for RM |
586 | e5603c3f | Florent Chuffart | - lower_bound_RM, the lower bound of the nucleosome for RM |
587 | e5603c3f | Florent Chuffart | - upper_bound_RM, the upper bound of the nucleosome for RM |
588 | e5603c3f | Florent Chuffart | - index_nuc_RM,the index of the RM nucleosome in the CUR for RM |
589 | e5603c3f | Florent Chuffart | - cur_index, index of the CUR |
590 | e5603c3f | Florent Chuffart | - form |
591 | e5603c3f | Florent Chuffart | - BY_Mnase_Seq_1, the number of reads for the current nucleosome for the sample 1 |
592 | e5603c3f | Florent Chuffart | |
593 | e5603c3f | Florent Chuffart | Next columns concern indicators for each sample: |
594 | e5603c3f | Florent Chuffart | |
595 | e5603c3f | Florent Chuffart | - BY_Mnase_Seq_2, #reads for sample 2 |
596 | e5603c3f | Florent Chuffart | - BY_Mnase_Seq_3, #reads for sample 3 |
597 | e5603c3f | Florent Chuffart | - RM_Mnase_Seq_4, #reads for sample 4 |
598 | e5603c3f | Florent Chuffart | - RM_Mnase_Seq_5, #reads for sample 5 |
599 | e5603c3f | Florent Chuffart | - RM_Mnase_Seq_6, #reads for sample 6 |
600 | e5603c3f | Florent Chuffart | - BY_H3K14ac_36, #reads for sample 36 |
601 | e5603c3f | Florent Chuffart | - BY_H3K14ac_37, #reads for sample 37 |
602 | e5603c3f | Florent Chuffart | - BY_H3K14ac_53, #reads for sample 53 |
603 | e5603c3f | Florent Chuffart | - RM_H3K14ac_38, #reads for sample 38 |
604 | e5603c3f | Florent Chuffart | - RM_H3K14ac_39, #reads for sample 39 |
605 | e5603c3f | Florent Chuffart | |
606 | e5603c3f | Florent Chuffart | The 5 last columns concern DESeq analysis: |
607 | e5603c3f | Florent Chuffart | |
608 | e5603c3f | Florent Chuffart | - manip[a_manip] strain[a_strain] manip[a_strain]:strain[a_strain], the manip (marker) effect, the strain effect and the snep effect. These are the coefficients of the fitted generalized linear model. |
609 | 3961deb6 | Florent Chuffart | - pvalsGLM, the pvalue resulting from the comparison of the GLM model considering the interaction term *marker:strain* to the GLM model that does not consider it. This is the statsitcial significance of the interaction term and therefore the statistical significance of the SNEP. |
610 | e5603c3f | Florent Chuffart | - snep_index, a boolean set to TRUE if the pvalueGLM value is under the threshold computed with FDR function with a rate set to 0.0001. |
611 | e5603c3f | Florent Chuffart | |
612 | e5603c3f | Florent Chuffart | To execute this script, run the following command in your R console: |
613 | e5603c3f | Florent Chuffart | |
614 | 935a568c | Florent Chuffart | .. code:: bash |
615 | 935a568c | Florent Chuffart | |
616 | e5603c3f | Florent Chuffart | source("src/current/launch_deseq.R") |
617 | 935a568c | Florent Chuffart | |
618 | 935a568c | Florent Chuffart | |
619 | e5603c3f | Florent Chuffart | Results: Number of SNEPs |
620 | e5603c3f | Florent Chuffart | ------------------------ |
621 | 935a568c | Florent Chuffart | |
622 | e5603c3f | Florent Chuffart | Here are the number of computed SNEPs for each forms. |
623 | 935a568c | Florent Chuffart | |
624 | e5603c3f | Florent Chuffart | ===== ======= ===== ======= |
625 | e5603c3f | Florent Chuffart | form strains #nucs H3K14ac |
626 | e5603c3f | Florent Chuffart | ===== ======= ===== ======= |
627 | e5603c3f | Florent Chuffart | wp BY-RM 30464 3549 |
628 | e5603c3f | Florent Chuffart | unr BY-RM 9497 1559 |
629 | e5603c3f | Florent Chuffart | wpunr BY-RM 39961 5240 |
630 | e5603c3f | Florent Chuffart | ===== ======= ===== ======= |
631 | e5603c3f | Florent Chuffart | |
632 | 935a568c | Florent Chuffart | |
633 | 935a568c | Florent Chuffart | |
634 | 935a568c | Florent Chuffart | |
635 | 935a568c | Florent Chuffart | |
636 | 935a568c | Florent Chuffart | |
637 | e5603c3f | Florent Chuffart | APPENDICE: Generate .c2c Files |
638 | e5603c3f | Florent Chuffart | ------------------------------ |
639 | 935a568c | Florent Chuffart | |
640 | 3961deb6 | Florent Chuffart | The `.c2c` files is a simple table that describes how two genome |
641 | 3961deb6 | Florent Chuffart | sequences are aligned. This file can be generated by using scripts that were developed in NucleoMiner 1.0 (Nagarajan et al. PLoS Genetics 2010) and which we provide in this release of NucleoMiner2. |
642 | 5badc2fd | Florent Chuffart | |
643 | 5badc2fd | Florent Chuffart | |
644 | 3961deb6 | Florent Chuffart | To use these scripts on your UNIX/LINUX computer you need first to install MUMmer which is designed to rapidly align entire genomes, whether in complete or draft form. |
645 | 935a568c | Florent Chuffart | |
646 | 3961deb6 | Florent Chuffart | Installing MUMmer |
647 | 3961deb6 | Florent Chuffart | ^^^^^^^^^^^^^^^^^ |
648 | 935a568c | Florent Chuffart | |
649 | 3961deb6 | Florent Chuffart | Get the last version of MUMmer archive on your computer (MUMmer3.23.tar.gz is provided in the directory deps of your working directory). Copy it in a dedicated directory. Install it locally into the src folder of you working directory by typing (working directory): |
650 | 935a568c | Florent Chuffart | |
651 | 3961deb6 | Florent Chuffart | tar -xvzf MUMmer3.23.tar.gz |
652 | 935a568c | Florent Chuffart | |
653 | 935a568c | Florent Chuffart | |
654 | 935a568c | Florent Chuffart | .. code:: bash |
655 | 935a568c | Florent Chuffart | |
656 | c25275e2 | Florent Chuffart | cd src |
657 | c25275e2 | Florent Chuffart | tar xfvz ../deps/MUMmer3.23.tar.gz |
658 | c25275e2 | Florent Chuffart | cd MUMmer3.23 |
659 | c25275e2 | Florent Chuffart | make check |
660 | c25275e2 | Florent Chuffart | make install |
661 | 935a568c | Florent Chuffart | |
662 | c25275e2 | Florent Chuffart | Installing NucleoMiner 1.0 scripts |
663 | c25275e2 | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
664 | e5603c3f | Florent Chuffart | |
665 | 3961deb6 | Florent Chuffart | Get the nucleominer-1.0.tar.gz archive on your computer (this archive is provided in the directory deps of your working directory). Install it locally into the src folder of you working directory by typing (working directory): |
666 | e5603c3f | Florent Chuffart | |
667 | 935a568c | Florent Chuffart | |
668 | e5603c3f | Florent Chuffart | .. code:: bash |
669 | 935a568c | Florent Chuffart | |
670 | c25275e2 | Florent Chuffart | cd src |
671 | c25275e2 | Florent Chuffart | tar xfvz ../deps/nucleominer-1.0.tar.gz |
672 | c25275e2 | Florent Chuffart | cd .. |
673 | 935a568c | Florent Chuffart | |
674 | 3961deb6 | Florent Chuffart | This creates a directory that contains NucleoMiner 1.0 scripts (src/nucleominer-1.0/scripts). |
675 | 935a568c | Florent Chuffart | |
676 | e5603c3f | Florent Chuffart | |
677 | e5603c3f | Florent Chuffart | Generate .c2c Files |
678 | e5603c3f | Florent Chuffart | ^^^^^^^^^^^^^^^^^^^ |
679 | e5603c3f | Florent Chuffart | |
680 | e5603c3f | Florent Chuffart | To generate .c2c files you need to type the following command in a terminal: |
681 | e5603c3f | Florent Chuffart | |
682 | e5603c3f | Florent Chuffart | .. code:: bash |
683 | 935a568c | Florent Chuffart | |
684 | c25275e2 | Florent Chuffart | export PATH=$PATH:src/MUMmer3.23:src/nucleominer-1.0/scripts |
685 | c25275e2 | Florent Chuffart | export PERL5LIB=$PERL5LIB:src/nucleominer-1.0/scripts/ |
686 | c25275e2 | Florent Chuffart | NMgxcomp data/saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta \ |
687 | c25275e2 | Florent Chuffart | data/saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta \ |
688 | c25275e2 | Florent Chuffart | data/byxrm 2>NMgxcomp.log |
689 | e5603c3f | Florent Chuffart | |
690 | c25275e2 | Florent Chuffart | After execution, the directory `data` will hold the .c2c files. |