Révision dadb6a4d doc/sphinx_doc/tuto.rst

b/doc/sphinx_doc/tuto.rst
11 11
from the dataset.
12 12

  
13 13

  
14
Dataset and Configuration File
15
------------------------------
14
Python and R Common Configuration File
15
--------------------------------------
16

  
17
First of all we define in one place some configuration variables that will be launch by python and R scripts. This file is **configurator.py**. The execution of this python script dumps variables into the **nucleo_miner_config.json** that will be launch by both kind of scriopts (R and puython).
18

  
19
To do this launch at the root of your project the following command line:
20

  
21
.. code:: bash
22

  
23
  python src/current/configurator.py
24
  
25

  
26
$$$ other python script to describe:
27
- libcoverage.py
28
- wf.py
29

  
30

  
31

  
32

  
33

  
34
Dataset and Configuration Variables
35
-----------------------------------
16 36

  
17 37
We want to compare nucleosomes of 3 yeast strains: 
18 38

  
......
29 49
- H3K14ac
30 50
- H4K12ac
31 51

  
32
In order to simplify the design of exeriment, we considere Mnase as a marker. 
52
In order to simplify the design of experiment, we considere Mnase as a marker. 
33 53
For each couple `(strain, marker)` we perform 3 replicates. So, theoritically 
34 54
we should have `3 * (1 + 5) * 3 = 54` samples. In practice we only obtain 2 
35 55
replicates for `(YJM, H3K4me1)`. Each one of the 53 samples is indentify by a 
......
236 256
55  52424414        47117107                   89,88%  3811 Mo           119 Mo              1477  s.
237 257
==  ==============  =========================  ======  ================  ==================  ================  
238 258

  
239
For some reasons (manipulation efficency, e.g. PCR...), we remove samples 33, 45, 48 and 55.
259
For some reasons (manipulation efficiency, e.g. PCR...), we remove samples 33, 45, 48 and 55.
240 260

  
241 261

  
242 262
Run TemplateFilter on Mnase Samples
243
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
263
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
244 264

  
245 265
Finally, for each sample we perfome TemplateFilter analysis. 
246 266

  
......
291 311

  
292 312

  
293 313

  
294
This preprocessing step consists in the 4 main steps embed in the `wf.py` and 
295
described bellow. As a preamble, this script computes `samples` `samples_mnase` 
296
and `strains` that will be used along the 4 steps.
314
The second part of the tutorial uses `R` (http://http://www.r-project.org). It consists in a set of R scripts taht will be sourced in an R console launched at the root of your project. the R srcipts are:
297 315

  
298

  
299
The second part of the tutoriel use `R` (http://http://www.r-project.org). It 
300
consists in the following main steps:
301

  
302
  - compute_rois.R
316
  - headers.R
303 317
  - extract_maps.R
304 318
  - compare_common_wp.R
305 319
  - split_samples.R
......
307 321
  - get_size_factors  
308 322
  - launch_deseq.R
309 323

  
310
Computing Common Genome Region Between Strains
311
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
324
The Script headers.R
325
^^^^^^^^^^^^^^^^^^^^
326

  
327
The script header.R is included in each other scripts. It is in charge of: 
328

  
329
  - launching libraries used in thes scripts
330
  - launching configuration (design, strain, marker...)
331
  - computing and caching CURs
332

  
333
In your R console, run the following command line:
312 334

  
313 335
.. code:: bash
314 336

  
315
  R CMD BATCH src/current/compute_rois.R
337
  R CMD BATCH src/current/header.R
338

  
339

  
340
The Script extract_maps.R
341
^^^^^^^^^^^^^^^^^^^^^^^^^
342
This script is in charge of extracting Maps for well positioned and fuzzy nucleosomes. First of all, this script computed intra and inter strain nucleosome maps for each CUR. This step is executed in parallel on many cores using the BoT library. Next, it collects results and produces well positioned, fuzzy and UNR maps.
343

  
344
The well-positioned map for BY is collected in the result directory and is called **BY_wp.tab**. It is composed of following columns:
345

  
346
 - chr, the number of the chromosome 
347
 - lower_bound, the lower bound of the nucleosome
348
 - upper_bound, the upper bound of the nucleosome 
349
 - cur_index, index of the CUR
350
 - index_nuc, the index of the nucleosome in the CUR
351
 - wp, 1 if it is a well positioned nucleosome, 0 else
352
 - nb_reads, the number of reads that supports this nucleosome
353
 - nb_nucs, the number of TemplateFilter nucleosome across replicates (= the number of replicates if it is a well-positioned nucleosome)
354
 - llr_1, for a well-positioned nucleosome, it is the LLR1 between the first and the second TemplateFilter nucleosome.
355
 - llr_2, for a well-positioned nucleosome, it is the LLR1 between the second and the first TemplateFilter nucleosome.
356
 - wp_llr, for a well-positioned nucleosome, it is the LLR2 overall TemplateFilter nucleosomes.
357
 - wp_pval, for a well-positioned nucleosome, it is the p-value chi square test obtained with the LLR2 (**1-pchisq(2.LLR2, df=4)**)
358
 - dyad_shift, for a well-positioned nucleosome, it is shift between the two extreme TemplateFilter nucleosome dyad positions. 
359

  
360
The fuzzy map for BY is collected in the result directory and is called **BY_fuzzy.tab**. It is composed of following columns:
361

  
362
 - chr, the number of the chromosome 
363
 - lower_bound, the lower bound of the nucleosome
364
 - upper_bound, the upper bound of the nucleosome 
365
 - cur_index, index of the CUR
366

  
367
The common well-position map for BY and RM strains is collected in the result directory and is called **BY_RM_common_wp.tab**. It is composed of following columns:
368

  
369
 - cur_index, the index of the CUR
370
 - index_nuc_BY, the index of the BY nucleosome in the CUR
371
 - index_nuc_RM,the index of the RM nucleosome in the CUR
372
 - llr_score, the LLR3 score between th eBy and RM nucleosomes
373
 - common_wp_pval,  the p-value chi square test obtained with the LLR3 (**1-pchisq(2.LLR3, df=2)**)
316 374

  
375
The common UNR map for BY and RM strains is collected in the result directory and is called **BY_RM_common_unr.tab**. It is composed of following columns:
317 376

  
318
Extracting Maps for Well Positionned and Fuzzy Nucleosomes
319
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
377
 - cur_index, the index of the CUR
378
 - index_nuc_BY, the index of the BY nucleosome in the CUR
379
 - index_nuc_RM,the index of the RM nucleosome in the CUR
380

  
381
To execute this script, run the following command line in your R console:
320 382

  
321 383
.. code:: bash
322 384

  
323
  R CMD BATCH src/current/extract_maps.R
385
  source("src/current/extract_maps.R")
386

  
387

  
388
The Script compare_common_wp.R
389
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
390

  
391
This script is used to compare inter strain distances between common well-positioned nucleosomes. 
392

  
393
For example, it compute the file **BY_RM_common_wp_diff.tab** that contains dyad shifts between two well-positioned nucleosomes. It is composed of following columns:
394
 - cur_index, the index of the CUR
395
 - index_nuc_BY, the index of the BY nucleosome in the CUR
396
 - index_nuc_RM,the index of the RM nucleosome in the CUR
397
 - llr_score, the LLR3 score between th eBy and RM nucleosomes
398
 - common_wp_pval,  the p-value chi square test obtained with the LLR3 (**1-pchisq(2.LLR3, df=2)**)
399
 - diff, the dyad shifts between two well-positioned nucleosomes
400

  
401
It also translates well-positioned nucleosome maps from a strain to an other strain and stores it into a table. 
402

  
403
For example, the file **results/2014-04/RM_wp_tr_2_BY.tab** contains RM well-positioned nucleosome translated into the BY genome referential. It is composed of following columns:
404

  
405
 - strain_ref, the reference genome (in which positioned are defined)
406
 - begin, the translated lower bound of the nucleosome
407
 - end, the translated upper bound of the nucleosome
408
 - chr, the number of chromosome for the reference genome (in which positioned are defined)
409
 - length, the length of the nucleosome (could be negative)
410
 - cur_index, the index of the CUR
411
 - index_nuc, the index of the nucleosome in the CUR
412

  
324 413

  
325 414

  
326
Compute Distance Between Well Positionned Nucleosomes 
327
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
415
To execute this script, run the following command line in your R console:
328 416

  
329 417
.. code:: bash
330 418

  
......
378 466
Where combi is in {BY_RM, BY_YJM, RM_YJM} for each strain combination, marker is 
379 467
in {H3K4me1, H3K4me3, H3K9ac, H3K14ac, H4K12ac} for each post translational 
380 468
histone modification and form is in {wp, fuzzy, wpfuzzy} considering well 
381
positionned nucleosomes, fuzzy nucleosomes or both for SNEP computation.
469
positioned nucleosomes, fuzzy nucleosomes or both for SNEP computation.
382 470

  
383 471

  
384 472

  

Formats disponibles : Unified diff