Révision dadb6a4d doc/sphinx_doc/tuto.rst
b/doc/sphinx_doc/tuto.rst | ||
---|---|---|
11 | 11 |
from the dataset. |
12 | 12 |
|
13 | 13 |
|
14 |
Dataset and Configuration File |
|
15 |
------------------------------ |
|
14 |
Python and R Common Configuration File |
|
15 |
-------------------------------------- |
|
16 |
|
|
17 |
First of all we define in one place some configuration variables that will be launch by python and R scripts. This file is **configurator.py**. The execution of this python script dumps variables into the **nucleo_miner_config.json** that will be launch by both kind of scriopts (R and puython). |
|
18 |
|
|
19 |
To do this launch at the root of your project the following command line: |
|
20 |
|
|
21 |
.. code:: bash |
|
22 |
|
|
23 |
python src/current/configurator.py |
|
24 |
|
|
25 |
|
|
26 |
$$$ other python script to describe: |
|
27 |
- libcoverage.py |
|
28 |
- wf.py |
|
29 |
|
|
30 |
|
|
31 |
|
|
32 |
|
|
33 |
|
|
34 |
Dataset and Configuration Variables |
|
35 |
----------------------------------- |
|
16 | 36 |
|
17 | 37 |
We want to compare nucleosomes of 3 yeast strains: |
18 | 38 |
|
... | ... | |
29 | 49 |
- H3K14ac |
30 | 50 |
- H4K12ac |
31 | 51 |
|
32 |
In order to simplify the design of exeriment, we considere Mnase as a marker. |
|
52 |
In order to simplify the design of experiment, we considere Mnase as a marker.
|
|
33 | 53 |
For each couple `(strain, marker)` we perform 3 replicates. So, theoritically |
34 | 54 |
we should have `3 * (1 + 5) * 3 = 54` samples. In practice we only obtain 2 |
35 | 55 |
replicates for `(YJM, H3K4me1)`. Each one of the 53 samples is indentify by a |
... | ... | |
236 | 256 |
55 52424414 47117107 89,88% 3811 Mo 119 Mo 1477 s. |
237 | 257 |
== ============== ========================= ====== ================ ================== ================ |
238 | 258 |
|
239 |
For some reasons (manipulation efficency, e.g. PCR...), we remove samples 33, 45, 48 and 55. |
|
259 |
For some reasons (manipulation efficiency, e.g. PCR...), we remove samples 33, 45, 48 and 55.
|
|
240 | 260 |
|
241 | 261 |
|
242 | 262 |
Run TemplateFilter on Mnase Samples |
243 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
263 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
244 | 264 |
|
245 | 265 |
Finally, for each sample we perfome TemplateFilter analysis. |
246 | 266 |
|
... | ... | |
291 | 311 |
|
292 | 312 |
|
293 | 313 |
|
294 |
This preprocessing step consists in the 4 main steps embed in the `wf.py` and |
|
295 |
described bellow. As a preamble, this script computes `samples` `samples_mnase` |
|
296 |
and `strains` that will be used along the 4 steps. |
|
314 |
The second part of the tutorial uses `R` (http://http://www.r-project.org). It consists in a set of R scripts taht will be sourced in an R console launched at the root of your project. the R srcipts are: |
|
297 | 315 |
|
298 |
|
|
299 |
The second part of the tutoriel use `R` (http://http://www.r-project.org). It |
|
300 |
consists in the following main steps: |
|
301 |
|
|
302 |
- compute_rois.R |
|
316 |
- headers.R |
|
303 | 317 |
- extract_maps.R |
304 | 318 |
- compare_common_wp.R |
305 | 319 |
- split_samples.R |
... | ... | |
307 | 321 |
- get_size_factors |
308 | 322 |
- launch_deseq.R |
309 | 323 |
|
310 |
Computing Common Genome Region Between Strains |
|
311 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
324 |
The Script headers.R |
|
325 |
^^^^^^^^^^^^^^^^^^^^ |
|
326 |
|
|
327 |
The script header.R is included in each other scripts. It is in charge of: |
|
328 |
|
|
329 |
- launching libraries used in thes scripts |
|
330 |
- launching configuration (design, strain, marker...) |
|
331 |
- computing and caching CURs |
|
332 |
|
|
333 |
In your R console, run the following command line: |
|
312 | 334 |
|
313 | 335 |
.. code:: bash |
314 | 336 |
|
315 |
R CMD BATCH src/current/compute_rois.R |
|
337 |
R CMD BATCH src/current/header.R |
|
338 |
|
|
339 |
|
|
340 |
The Script extract_maps.R |
|
341 |
^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
342 |
This script is in charge of extracting Maps for well positioned and fuzzy nucleosomes. First of all, this script computed intra and inter strain nucleosome maps for each CUR. This step is executed in parallel on many cores using the BoT library. Next, it collects results and produces well positioned, fuzzy and UNR maps. |
|
343 |
|
|
344 |
The well-positioned map for BY is collected in the result directory and is called **BY_wp.tab**. It is composed of following columns: |
|
345 |
|
|
346 |
- chr, the number of the chromosome |
|
347 |
- lower_bound, the lower bound of the nucleosome |
|
348 |
- upper_bound, the upper bound of the nucleosome |
|
349 |
- cur_index, index of the CUR |
|
350 |
- index_nuc, the index of the nucleosome in the CUR |
|
351 |
- wp, 1 if it is a well positioned nucleosome, 0 else |
|
352 |
- nb_reads, the number of reads that supports this nucleosome |
|
353 |
- nb_nucs, the number of TemplateFilter nucleosome across replicates (= the number of replicates if it is a well-positioned nucleosome) |
|
354 |
- llr_1, for a well-positioned nucleosome, it is the LLR1 between the first and the second TemplateFilter nucleosome. |
|
355 |
- llr_2, for a well-positioned nucleosome, it is the LLR1 between the second and the first TemplateFilter nucleosome. |
|
356 |
- wp_llr, for a well-positioned nucleosome, it is the LLR2 overall TemplateFilter nucleosomes. |
|
357 |
- wp_pval, for a well-positioned nucleosome, it is the p-value chi square test obtained with the LLR2 (**1-pchisq(2.LLR2, df=4)**) |
|
358 |
- dyad_shift, for a well-positioned nucleosome, it is shift between the two extreme TemplateFilter nucleosome dyad positions. |
|
359 |
|
|
360 |
The fuzzy map for BY is collected in the result directory and is called **BY_fuzzy.tab**. It is composed of following columns: |
|
361 |
|
|
362 |
- chr, the number of the chromosome |
|
363 |
- lower_bound, the lower bound of the nucleosome |
|
364 |
- upper_bound, the upper bound of the nucleosome |
|
365 |
- cur_index, index of the CUR |
|
366 |
|
|
367 |
The common well-position map for BY and RM strains is collected in the result directory and is called **BY_RM_common_wp.tab**. It is composed of following columns: |
|
368 |
|
|
369 |
- cur_index, the index of the CUR |
|
370 |
- index_nuc_BY, the index of the BY nucleosome in the CUR |
|
371 |
- index_nuc_RM,the index of the RM nucleosome in the CUR |
|
372 |
- llr_score, the LLR3 score between th eBy and RM nucleosomes |
|
373 |
- common_wp_pval, the p-value chi square test obtained with the LLR3 (**1-pchisq(2.LLR3, df=2)**) |
|
316 | 374 |
|
375 |
The common UNR map for BY and RM strains is collected in the result directory and is called **BY_RM_common_unr.tab**. It is composed of following columns: |
|
317 | 376 |
|
318 |
Extracting Maps for Well Positionned and Fuzzy Nucleosomes |
|
319 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
377 |
- cur_index, the index of the CUR |
|
378 |
- index_nuc_BY, the index of the BY nucleosome in the CUR |
|
379 |
- index_nuc_RM,the index of the RM nucleosome in the CUR |
|
380 |
|
|
381 |
To execute this script, run the following command line in your R console: |
|
320 | 382 |
|
321 | 383 |
.. code:: bash |
322 | 384 |
|
323 |
R CMD BATCH src/current/extract_maps.R |
|
385 |
source("src/current/extract_maps.R") |
|
386 |
|
|
387 |
|
|
388 |
The Script compare_common_wp.R |
|
389 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
390 |
|
|
391 |
This script is used to compare inter strain distances between common well-positioned nucleosomes. |
|
392 |
|
|
393 |
For example, it compute the file **BY_RM_common_wp_diff.tab** that contains dyad shifts between two well-positioned nucleosomes. It is composed of following columns: |
|
394 |
- cur_index, the index of the CUR |
|
395 |
- index_nuc_BY, the index of the BY nucleosome in the CUR |
|
396 |
- index_nuc_RM,the index of the RM nucleosome in the CUR |
|
397 |
- llr_score, the LLR3 score between th eBy and RM nucleosomes |
|
398 |
- common_wp_pval, the p-value chi square test obtained with the LLR3 (**1-pchisq(2.LLR3, df=2)**) |
|
399 |
- diff, the dyad shifts between two well-positioned nucleosomes |
|
400 |
|
|
401 |
It also translates well-positioned nucleosome maps from a strain to an other strain and stores it into a table. |
|
402 |
|
|
403 |
For example, the file **results/2014-04/RM_wp_tr_2_BY.tab** contains RM well-positioned nucleosome translated into the BY genome referential. It is composed of following columns: |
|
404 |
|
|
405 |
- strain_ref, the reference genome (in which positioned are defined) |
|
406 |
- begin, the translated lower bound of the nucleosome |
|
407 |
- end, the translated upper bound of the nucleosome |
|
408 |
- chr, the number of chromosome for the reference genome (in which positioned are defined) |
|
409 |
- length, the length of the nucleosome (could be negative) |
|
410 |
- cur_index, the index of the CUR |
|
411 |
- index_nuc, the index of the nucleosome in the CUR |
|
412 |
|
|
324 | 413 |
|
325 | 414 |
|
326 |
Compute Distance Between Well Positionned Nucleosomes |
|
327 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
415 |
To execute this script, run the following command line in your R console: |
|
328 | 416 |
|
329 | 417 |
.. code:: bash |
330 | 418 |
|
... | ... | |
378 | 466 |
Where combi is in {BY_RM, BY_YJM, RM_YJM} for each strain combination, marker is |
379 | 467 |
in {H3K4me1, H3K4me3, H3K9ac, H3K14ac, H4K12ac} for each post translational |
380 | 468 |
histone modification and form is in {wp, fuzzy, wpfuzzy} considering well |
381 |
positionned nucleosomes, fuzzy nucleosomes or both for SNEP computation.
|
|
469 |
positioned nucleosomes, fuzzy nucleosomes or both for SNEP computation. |
|
382 | 470 |
|
383 | 471 |
|
384 | 472 |
|
Formats disponibles : Unified diff