Révision 3961deb6
b/README | ||
---|---|---|
101 | 101 |
------------ |
102 | 102 |
|
103 | 103 |
The first installation step is to retrieve the source code of |
104 |
MyLabStocks. You can do this by typing the following command in a
|
|
104 |
NucleoMiner2. You can do this by typing the following command in a
|
|
105 | 105 |
terminal. |
106 | 106 |
|
107 | 107 |
git clone http://forge.cbp.ens-lyon.fr/git/nucleominer |
... | ... | |
135 | 135 |
|
136 | 136 |
* DESeq |
137 | 137 |
|
138 |
These packages can be installed by typing the following command in an |
|
139 |
R console: |
|
140 |
|
|
141 |
install.packages(c("fork", "rjson", "seqinr", "plotrix")) |
|
142 |
source("http://bioconductor.org/biocLite.R") |
|
143 |
biocLite("DESeq") |
|
144 |
|
|
145 |
Finally,by typing the git command above, you downloaded specific R |
|
146 |
packages provided with NucleoMiner2 that you now need to install: |
|
147 |
|
|
138 | 148 |
* cachecache https://forge.cbp.ens- |
139 | 149 |
lyon.fr/redmine/projects/cachecache |
140 | 150 |
|
... | ... | |
143 | 153 |
* nucleominer https://forge.cbp.ens- |
144 | 154 |
lyon.fr/redmine/projects/nucleominer |
145 | 155 |
|
146 |
The first packages could be installed by typing the following command |
|
147 |
in an R console: |
|
148 |
|
|
149 |
install.packages(c("fork", "rjson", "seqinr", "plotrix")) |
|
150 |
source("http://bioconductor.org/biocLite.R") |
|
151 |
biocLite("DESeq") |
|
152 |
|
|
153 |
The last packages are available in the git repository they could be |
|
154 |
install by typing the following command in your terminal: |
|
156 |
To do so, type the following command in your terminal: |
|
155 | 157 |
|
156 | 158 |
cd nucleominer |
157 | 159 |
R CMD INSTALL doc/Chuffart_NM2_workdir/deps/bot_0.14.tar.gz\ |
... | ... | |
161 | 163 |
Tutorial |
162 | 164 |
******** |
163 | 165 |
|
164 |
This tutorial describes steps allowing performing quantitative
|
|
166 |
This tutorial describes steps allowing to perform quantitative
|
|
165 | 167 |
analysis of epigenetic marks on individual nucleosomes. We assume that |
166 | 168 |
files are organised according to a given hierarchy and that all |
167 | 169 |
command lines are launched from the project’s root directory. |
168 | 170 |
|
169 | 171 |
This tutorial is divided into two main parts. The first part covers |
170 | 172 |
the python script *wf.py* that aligns and converts short sequence |
171 |
reads. The second part covers the R scripts that extracts information |
|
172 |
(nucleosome position and indicators) from the dataset. |
|
173 |
reads. The second part covers the R scripts that extracts nucleosome- |
|
174 |
level information (nucleosome position and indicators) from the |
|
175 |
dataset. |
|
173 | 176 |
|
174 | 177 |
|
175 | 178 |
Experimental Dataset, Working Directory and Configuration File |
... | ... | |
179 | 182 |
Working Directory Organisation |
180 | 183 |
------------------------------ |
181 | 184 |
|
182 |
After having install NucleoMiner2 environment (Previous section), go |
|
185 |
After having installed NucleoMiner2 environment (Previous section), go
|
|
183 | 186 |
to the root working directory of the tutorial by typing the following |
184 | 187 |
command in a terminal: |
185 | 188 |
|
... | ... | |
196 | 199 |
$$$ TODO explain how organise Experimental Dataset into the *data* |
197 | 200 |
directory of the working directory. |
198 | 201 |
|
199 |
We want to compare nucleosomes of 2 yeast strains: BY and RM. For each |
|
200 |
strain we performed Mnase-Seq and ChIP-Seq using an antibody |
|
201 |
recognizing the H3K14ac epigenetic mark. |
|
202 |
In this tutorial, we want to compare nucleosomes of 2 yeast strains: |
|
203 |
BY and RM. For each strain Mnase-Seq was performed as well as ChIP-Seq |
|
204 |
using an antibody recognizing the H3K14ac epigenetic mark. Illumina |
|
205 |
sequencing was done in single-read of 50 bp long. |
|
202 | 206 |
|
203 | 207 |
The dataset is composed of 55 files organised as follows: |
204 | 208 |
|
... | ... | |
236 | 240 |
Python and R Common Configuration File |
237 | 241 |
-------------------------------------- |
238 | 242 |
|
239 |
First of all we define in one place some configuration variables that
|
|
240 |
will be launched by python and R scripts. These variables are
|
|
241 |
contained in file *configurator.py*. The execution of this python
|
|
242 |
script dumps variables into the *nucleominer_config.json* file that
|
|
243 |
will then be used by both R and python scripts.
|
|
243 |
First, we need to define useful configuration variables that will be
|
|
244 |
passed to python and R scripts. These variables are contained in file
|
|
245 |
*configurator.py*. The execution of this python script dumps variables
|
|
246 |
into the *nucleominer_config.json* file that will then be used by both
|
|
247 |
R and python scripts. |
|
244 | 248 |
|
245 |
To do this, go to the root directory of your project and run the |
|
246 |
following command: |
|
249 |
The initialization of this variables is done in the configurator.py |
|
250 |
file. If you need to adapt variable values (path, default |
|
251 |
parameters...) you need to edit this file. Then, go to the root |
|
252 |
directory of your project and run the following command to dump the |
|
253 |
configuration file: |
|
247 | 254 |
|
248 | 255 |
python src/current/configurator.py |
249 | 256 |
|
... | ... | |
251 | 258 |
Preprocessing Illumina Fastq Reads for Each Sample |
252 | 259 |
================================================== |
253 | 260 |
|
254 |
This preprocessing step consists of 4 main steps embedded in the |
|
255 |
*wf.py* script. They are described bellow. As a preamble, this script |
|
256 |
computes *samples*, *samples_mnase* and *strains* that will be used |
|
257 |
along the 4 steps. |
|
261 |
Once variables and design have been specified, the script wf.py will |
|
262 |
automatically run all the analysis. You don't need to do anything. To |
|
263 |
run the full analysis, run the following command: |
|
264 |
|
|
265 |
python src/current/wf.py |
|
266 |
|
|
267 |
The details of the steps performed by this script are explained below. |
|
268 |
This preprocessing consists of 4 steps embedded in the *wf.py* script. |
|
269 |
They are described bellow. As a preamble, this script computes |
|
270 |
*samples*, *samples_mnase* and *strains* that will be used along the 4 |
|
271 |
steps. |
|
258 | 272 |
|
259 | 273 |
wf.samples = [] |
260 | 274 |
|
... | ... | |
273 | 287 |
Creating Bowtie Index from each Reference Genome |
274 | 288 |
------------------------------------------------ |
275 | 289 |
|
276 |
For each strain, we need to create bowtie index. Bowtie index of a
|
|
277 |
strain is a tree view of the genome of this strain. It will be used by
|
|
278 |
bowtie to align reads. This step is performed by the following part of
|
|
279 |
the *wf.py* script:
|
|
290 |
For each strain, the script *wf.py* then creates bowtie index. Bowtie
|
|
291 |
index of a strain is a tree view of the genome of this strain. It will
|
|
292 |
be used by bowtie to align reads. The part of the script performing
|
|
293 |
this is the following:
|
|
280 | 294 |
|
281 | 295 |
for strain in strains: |
282 | 296 |
per_strain_stats[strain] = create_bowtie_index(strain, |
283 | 297 |
config["FASTA_REFERENCE_GENOME_FILES"][strain], config["INDEX_DIR"], |
284 | 298 |
config["BOWTIE_BUILD_BIN"]) |
285 | 299 |
|
286 |
The following table summarizes the file sizes and process durations |
|
287 |
concerning this step. |
|
300 |
As an indication, the following table summarizes the file sizes and |
|
301 |
process durations that we experienced when running this step on a |
|
302 |
Linux server***. |
|
288 | 303 |
|
289 | 304 |
+--------+------------------------+------------------------+------------------+ |
290 | 305 |
| strain | fasta genome file size | bowtie index file size | process duration | |
... | ... | |
298 | 313 |
Aligning Reads to Reference Genome |
299 | 314 |
---------------------------------- |
300 | 315 |
|
301 |
Next, we launch bowtie to align reads to the reference genome. It
|
|
302 |
produces a *.sam* file that we convert into a *.bed* file. Binaries
|
|
303 |
for *bowtie*, *samtools* and *bedtools* are wrapped using python
|
|
304 |
*subprocess* class. This step is performed by the following part of
|
|
305 |
the *wf.py* script:
|
|
316 |
Next, the *wf.py* script launches bowtie to align reads to the
|
|
317 |
reference genome. It produces a *.sam* file that is converted into a
|
|
318 |
*.bed* file. Binaries for *bowtie*, *samtools* and *bedtools* are
|
|
319 |
wrapped using python *subprocess* class. This step is performed by the
|
|
320 |
following part of the script:
|
|
306 | 321 |
|
307 | 322 |
for sample in samples: |
308 | 323 |
per_sample_align_stats["sample_%s" % sample["id"]] = align_reads(sample, |
... | ... | |
315 | 330 |
------------------------------------------------ |
316 | 331 |
|
317 | 332 |
TemplateFilter uses particular input formats for reads, so it is |
318 |
necessary to convert the *.bed* files. TemplateFilter expect reads as
|
|
319 |
follows: *chr*, *coord*, *strand* and *#read* where:
|
|
333 |
necessary to convert the *.bed* files. TemplateFilter expect reads in
|
|
334 |
the following format: *chr*, *coord*, *strand* and *#read* where:
|
|
320 | 335 |
|
321 | 336 |
* *chr* is the number of the chromosome; |
322 | 337 |
|
... | ... | |
330 | 345 |
|
331 | 346 |
**WARNING** for reverse strands, bowtie returns the position of the |
332 | 347 |
first nucleotide on the left hand side, whereas TemplateFilter expects |
333 |
the first one on the right hand side. This step takes this into
|
|
334 |
account by adding the read length (in our case 50) to the reverse
|
|
348 |
the first one on the right hand side. This is taken into account in
|
|
349 |
NucleoMiner2 by adding the read length (in our case 50) to the reverse
|
|
335 | 350 |
reads coordinates. |
336 | 351 |
|
337 | 352 |
This step is performed by the following part of the *wf.py* script: |
... | ... | |
341 | 356 |
config["ALIGN_DIR"], config["FASTA_INDEXES"], config["AREA_BLACK_LIST"], |
342 | 357 |
config["READ_LENGTH"],config["MAPQ_THRES"]) |
343 | 358 |
|
344 |
The following table summarises the number of reads, the involved file |
|
345 |
sizes and process durations concerning the two last steps. In our |
|
346 |
case, alignment process have been multithreaded over 3 cores. |
|
359 |
The following table summarizes the number of reads, the involved file |
|
360 |
sizes and process durations that we experienced when running the two |
|
361 |
last steps. In our case, alignment process were multithreaded over 3 |
|
362 |
cores. |
|
347 | 363 |
|
348 | 364 |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
349 | 365 |
| id | Illumina reads | aligned and filtred reads | ratio | *.bed* file size | TF input file size | process duration | |
... | ... | |
378 | 394 |
Finally, for each sample we perform TemplateFilter analysis. |
379 | 395 |
|
380 | 396 |
**WARNING** TemplateFilter returns a list of nucleosomes. Each |
381 |
nucleosome is define by its center and its width. An odd width leads |
|
397 |
nucleosome is defined by its center and its width. An odd width leads
|
|
382 | 398 |
us to consider non- integer lower and upper bound. |
383 | 399 |
|
384 |
**WARNING** TemplateFilter is not designed to deal with replicates. So
|
|
400 |
**WARNING** TemplateFilter was not designed to handle replicates. So
|
|
385 | 401 |
we recommend to keep a maximum of nucleosomes and filter the aberrant |
386 | 402 |
ones afterwards using the benefits of having replicates. To do this, |
387 | 403 |
we set a low correlation threshold parameter (0.5) and a particularly |
... | ... | |
416 | 432 |
======================================================== |
417 | 433 |
|
418 | 434 |
The second part of the tutorial uses R |
419 |
(http://http://www.r-project.org). It consists of a set of R scripts
|
|
420 |
that will be sourced in an R from a console launched at the root of
|
|
421 |
your project. These scripts are: |
|
435 |
(http://http://www.r-project.org). NucleoMiner2 contains a set of R
|
|
436 |
scripts that will be sourced in R from a console launched at the root
|
|
437 |
of your project. These scripts are:
|
|
422 | 438 |
|
423 | 439 |
* headers.R |
424 | 440 |
|
... | ... | |
438 | 454 |
The Script headers.R |
439 | 455 |
-------------------- |
440 | 456 |
|
441 |
The script headers.R is included in each other scripts. It is in
|
|
457 |
The script headers.R is included in all other R scripts. It is in
|
|
442 | 458 |
charge of: |
443 | 459 |
|
444 | 460 |
* launching libraries used in the scripts |
445 | 461 |
|
446 | 462 |
* launching configuration (design, strain, marker...) |
447 | 463 |
|
448 |
* computing and caching CURs (caching means storing the
|
|
449 |
information in the computer's memory)
|
|
464 |
* computing and caching Common Uinterrupted Regions (CURs).
|
|
465 |
Caching means storing the information in the computer's memory.
|
|
450 | 466 |
|
451 | 467 |
Note that you can customize the function “translate”. This function |
452 | 468 |
allows you to use the alignments between genomes when performing |
453 |
various tasks. You may be using NucleoMiner2 to analyse data of a |
|
454 |
single strain, or of several strains. |
|
469 |
various tasks. |
|
455 | 470 |
|
456 |
* All the data corresponds to the same strain (e.g. |
|
457 |
treatment/control, or only few mutations): Then in step 1), the |
|
458 |
regions to use are entire chromosomes. Instep 2) simply use the |
|
471 |
* You may want to analyze data of a single strain (e.g. |
|
472 |
treatment/control, or only few mutations). In this case, the |
|
473 |
genome is identical across all samples and you do not need to |
|
474 |
define particular CURs (CURs are chromosomes). Simply use the |
|
459 | 475 |
default translate function which is neutral. |
460 | 476 |
|
461 |
* The data come from two or more strains: In this case, edit a
|
|
462 |
list of regions and customize the translate function which
|
|
463 |
performs the correspondence between the different genomes. How we
|
|
464 |
did it: a .c2c file is obtained with NucleoMiner 1.0 (refer to
|
|
465 |
the Appendice "Generate .c2c Files"), then use it to produce the
|
|
466 |
list of regions and customise “translate”. |
|
477 |
* If you are analyzing data from two or more strains (as
|
|
478 |
NucleoMiner2 was designed for), then you need to translate
|
|
479 |
coordinates of one genome into the coordinates of another one.
|
|
480 |
You must do this by aligning the two genomes, which will produce
|
|
481 |
a .c2c file (see Appendice "Generate .c2c Files"). thenuse it to
|
|
482 |
produce the list of regions and customise “translate”.
|
|
467 | 483 |
|
468 |
In your R console, run the following command line: |
|
484 |
In our tutorial, we are in the second case and to perform all these |
|
485 |
steps run the following command line in your R console: |
|
469 | 486 |
|
470 | 487 |
source("src/current/headers.R") |
471 | 488 |
|
... | ... | |
474 | 491 |
------------------------- |
475 | 492 |
|
476 | 493 |
This script is in charge of extracting Maps for well-positioned and |
477 |
fuzzy nucleosomes. First of all, this script computes intra and inter- |
|
478 |
strain nucleosome maps for each CUR. This step is executed in parallel |
|
479 |
on many cores using the BoT library. Next, it collects results and |
|
480 |
produces well-positioned, fuzzy and UNR maps. |
|
481 |
|
|
482 |
The well-positioned map for BY is collected in the result directory |
|
483 |
and is called *BY_wp.tab*. It is composed of following columns: |
|
494 |
sensitive nucleosomes. First of all, this script computes intra and |
|
495 |
inter-strain matches of nucleosome maps for each CUR. This step can be |
|
496 |
executed in parallel on many cores using the BoT library. Next, it |
|
497 |
collects results and produces maps of well-positioned nucleosomes, |
|
498 |
sensitive nucleosomes and Unaligned Nucleosomal Regions . |
|
499 |
|
|
500 |
The map of well-positioned nucleosomes for BY is collected in the |
|
501 |
result directory and is called *BY_wp.tab*. It is composed of |
|
502 |
following columns: |
|
484 | 503 |
|
485 | 504 |
* chr, the number of the chromosome |
486 | 505 |
|
... | ... | |
512 | 531 |
nucleosomes. |
513 | 532 |
|
514 | 533 |
* wp_pval, for a well-positioned nucleosome, it is the p-value |
515 |
chi square test obtained with the LLR2 (*1-pchisq(2.LLR2, df=4)*)
|
|
534 |
chi square test obtained from LLR2 (*1-pchisq(2.LLR2, df=4)*)
|
|
516 | 535 |
|
517 | 536 |
* dyad_shift, for a well-positioned nucleosome, it is the shift |
518 | 537 |
between the two extreme TemplateFilter nucleosome dyad positions. |
519 | 538 |
|
520 |
The fuzzy map for BY is collected in the result directory and is
|
|
539 |
The sensitive map for BY is collected in the result directory and is
|
|
521 | 540 |
called *BY_fuzzy.tab*. It is composed of following columns: |
522 | 541 |
|
523 | 542 |
* chr, the number of the chromosome |
... | ... | |
545 | 564 |
(*1-pchisq(2.LLR3, df=2)*) |
546 | 565 |
|
547 | 566 |
* diff, the dyads shift between the positions in the two strains |
567 |
(in bp) |
|
548 | 568 |
|
549 | 569 |
The common UNR map for BY and RM strains is collected in the result |
550 | 570 |
directory and is called *BY_RM_common_unr.tab*. It is composed of the |
... | ... | |
565 | 585 |
-------------------------------- |
566 | 586 |
|
567 | 587 |
This script is used to translate common well-positioned nucleosome |
568 |
maps from a strain to another strain and stores it into a table.
|
|
588 |
positions from a strain to another strain and stores it into a table.
|
|
569 | 589 |
|
570 | 590 |
For example, the file *results/2014-04/RM_wp_tr_2_BY.tab* contains RM |
571 |
well-positioned nucleosome translated into the BY genome coordinates. |
|
591 |
well-positioned nucleosomes translated into the BY genome coordinates.
|
|
572 | 592 |
It is composed of following columns: |
573 | 593 |
|
574 | 594 |
* strain_ref, the reference genome (in which positioned are |
... | ... | |
595 | 615 |
The Script split_samples.R |
596 | 616 |
-------------------------- |
597 | 617 |
|
598 |
For memory space usage reasons, we split and compress TemplateFilter
|
|
618 |
To optimize memory space usage, we split and compress TemplateFilter
|
|
599 | 619 |
input files according to their corresponding chromosome. for example, |
600 | 620 |
*sample_1_TF.tab* will be split into : |
601 | 621 |
|
... | ... | |
701 | 721 |
|
702 | 722 |
sample_id are given in file samples.csv |
703 | 723 |
|
704 |
If you don't know which column to use, we recommend using wpunr. |
|
724 |
If you don't know which column to use for normalization, we recommend |
|
725 |
using wpunr. |
|
705 | 726 |
|
706 |
If you want the very detailed factors produced by DESeq, here are the |
|
707 |
information: |
|
727 |
Here are the details of the factors produced: |
|
708 | 728 |
|
709 | 729 |
* unr: factor computed from data of UNR regions. These regions |
710 | 730 |
are defined for every pairs of aligned genomes (e.g. BY_RM) |
... | ... | |
791 | 811 |
strain effect and the snep effect. These are the coefficients of |
792 | 812 |
the fitted generalized linear model. |
793 | 813 |
|
794 |
* pvalsGLM, the pvalue resulting of the comparison of the GLM |
|
795 |
model considering or not the interaction term marker:strain. This |
|
796 |
is the statsitcial significance of the interaction term and |
|
797 |
therefore the statistical significance of the SNEP. |
|
814 |
* pvalsGLM, the pvalue resulting from the comparison of the GLM |
|
815 |
model considering the interaction term *marker:strain* to the GLM |
|
816 |
model that does not consider it. This is the statsitcial |
|
817 |
significance of the interaction term and therefore the |
|
818 |
statistical significance of the SNEP. |
|
798 | 819 |
|
799 | 820 |
* snep_index, a boolean set to TRUE if the pvalueGLM value is |
800 | 821 |
under the threshold computed with FDR function with a rate set to |
801 | 822 |
0.0001. |
802 | 823 |
|
803 |
To execute this script, run the following command |
|
804 |
|
|
805 | 824 |
To execute this script, run the following command in your R console: |
806 | 825 |
|
807 | 826 |
source("src/current/launch_deseq.R") |
... | ... | |
826 | 845 |
APPENDICE: Generate .c2c Files |
827 | 846 |
============================== |
828 | 847 |
|
829 |
The *.c2c* files is a simple table that describes how the genome |
|
830 |
sequence can be aligned. We generate it using some NucleoMiner 1.0 |
|
831 |
scripts. |
|
848 |
The *.c2c* files is a simple table that describes how two genome |
|
849 |
sequences are aligned. This file can be generated by using scripts |
|
850 |
that were developed in NucleoMiner 1.0 (Nagarajan et al. PLoS Genetics |
|
851 |
2010) and which we provide in this release of NucleoMiner2. |
|
832 | 852 |
|
833 |
To use NucleoMiner 1.0 scripts on your UNIX/LINUX computer you need
|
|
834 |
first to install MUMmer which is a system for rapidly aligning entire
|
|
835 |
genomes, whether in complete or draft form.
|
|
853 |
To use these scripts on your UNIX/LINUX computer you need first to
|
|
854 |
install MUMmer which is designed to rapidly align entire genomes,
|
|
855 |
whether in complete or draft form. |
|
836 | 856 |
|
837 | 857 |
|
838 |
Installing the MUMmer library
|
|
839 |
-----------------------------
|
|
858 |
Installing MUMmer
|
|
859 |
----------------- |
|
840 | 860 |
|
841 | 861 |
Get the last version of MUMmer archive on your computer |
842 |
(MUMmer3.23.tar.gz distributed in the directory deps of your working
|
|
862 |
(MUMmer3.23.tar.gz is provided in the directory deps of your working
|
|
843 | 863 |
directory). Copy it in a dedicated directory. Install it locally into |
844 | 864 |
the src folder of you working directory by typing (working directory): |
845 | 865 |
|
846 |
tar -xvzf gdl-1.0.tar.gz |
|
847 |
|
|
848 |
This creates a directory called gdl-1.0. You now need to go into this |
|
849 |
directory and compile the library, by typing: |
|
866 |
tar -xvzf MUMmer3.23.tar.gz |
|
850 | 867 |
|
851 | 868 |
cd src |
852 | 869 |
tar xfvz ../deps/MUMmer3.23.tar.gz |
... | ... | |
858 | 875 |
Installing NucleoMiner 1.0 scripts |
859 | 876 |
---------------------------------- |
860 | 877 |
|
861 |
Get the nucleominer-1.0.tar.gz archive on your computer (distributed
|
|
862 |
in the directory deps of your working directory). Install it locally
|
|
863 |
into the src folder of you working directory by typing (working
|
|
864 |
directory): |
|
878 |
Get the nucleominer-1.0.tar.gz archive on your computer (this archive
|
|
879 |
is provided in the directory deps of your working directory). Install
|
|
880 |
it locally into the src folder of you working directory by typing
|
|
881 |
(working directory):
|
|
865 | 882 |
|
866 | 883 |
cd src |
867 | 884 |
tar xfvz ../deps/nucleominer-1.0.tar.gz |
868 | 885 |
cd .. |
869 | 886 |
|
870 |
This creates a directory called that contains NucleoMiner 1.0 scripts
|
|
887 |
This creates a directory that contains NucleoMiner 1.0 scripts |
|
871 | 888 |
(src/nucleominer-1.0/scripts). |
872 | 889 |
|
873 | 890 |
|
b/doc/Chuffart_NM2_sweave/Chuffart_NM2_sweave.Rnw | ||
---|---|---|
1101 | 1101 |
print(nrow(by_not_1n[by_not_1n$ratio>0.6,])) |
1102 | 1102 |
print("and overlaping 100% a nm fuzzy?") |
1103 | 1103 |
print(nrow(by_not_1n[by_not_1n$ratio==1,])) |
1104 |
@ |
|
1105 | 1104 |
|
1105 |
print("nb wp nm2 nuc aligned with dp nuc?") |
|
1106 |
tmp_nuc_by = by_only_wp[!is.na(by_only_wp$start) ,] |
|
1107 |
print(length(tmp_nuc_by$diff)) |
|
1108 |
print("...with diff = 0?") |
|
1109 |
print(sum(tmp_nuc_by$diff==0)) |
|
1110 |
print("...with diff < 1?") |
|
1111 |
print(sum(tmp_nuc_by$diff<1)) |
|
1112 |
print("...with diff <= 1bp?") |
|
1113 |
print(sum(tmp_nuc_by$diff<=1)) |
|
1114 |
print("...with diff <= 5bp?") |
|
1115 |
print(sum(tmp_nuc_by$diff<=5)) |
|
1116 |
print("...with diff <= 30?") |
|
1117 |
print(sum(tmp_nuc_by$diff<=30)) |
|
1118 |
|
|
1119 |
@ |
|
1106 | 1120 |
|
1107 | 1121 |
\subsection{RM map} |
1108 | 1122 |
|
... | ... | |
1131 | 1145 |
|
1132 | 1146 |
\subsection{NM2.0 vs Danpos2, fuzziness} |
1133 | 1147 |
<<echo=FALSE, results=hide, fig=TRUE, height=6, width=10, eval=TRUE >>= |
1134 |
|
|
1135 | 1148 |
# BY |
1136 | 1149 |
nuc_by = by_1n[by_1n$wp==1 & !is.na(by_1n$start) ,] |
1137 | 1150 |
nuc_by$key = paste(nuc_by$chr, nuc_by$lower_bound, nuc_by$upper_bound, nuc_by$cur_index, sep="_") |
... | ... | |
1141 | 1154 |
idx = match(nuc_by$key,by_wp_nm$key) |
1142 | 1155 |
|
1143 | 1156 |
nuc_by$wp_pval = by_wp_nm$wp_pval[idx] |
1157 |
nuc_by = nuc_by[nuc_by$wp_pval !=0,] |
|
1144 | 1158 |
dx_by <- density(nuc_by$diff) |
1145 | 1159 |
|
1146 | 1160 |
# RM |
1147 |
|
|
1148 | 1161 |
nuc_rm = rm_1n[rm_1n$wp==1 & !is.na(rm_1n$start) ,] |
1149 | 1162 |
nuc_rm$key = paste(nuc_rm$chr, nuc_rm$lower_bound, nuc_rm$upper_bound, nuc_rm$cur_index, sep="_") |
1150 | 1163 |
|
... | ... | |
1153 | 1166 |
idx = match(nuc_rm$key,rm_wp_nm$key) |
1154 | 1167 |
|
1155 | 1168 |
nuc_rm$wp_pval = rm_wp_nm$wp_pval[idx] |
1169 |
nuc_rm = nuc_rm[nuc_rm$wp_pval !=0,] |
|
1156 | 1170 |
dx_rm <- density(nuc_rm$diff) |
1157 | 1171 |
|
1172 |
|
|
1173 |
library(MASS) |
|
1174 |
Lab.palette <- colorRampPalette(c("white", "orange", "red"), space = "Lab") |
|
1175 |
|
|
1158 | 1176 |
m = cbind( |
1159 | 1177 |
rbind(matrix(rep(1,4),2),matrix(rep(4,4),2)), |
1160 | 1178 |
rbind(matrix(rep(2,2),2),matrix(rep(5,2),2)), |
... | ... | |
1162 | 1180 |
) |
1163 | 1181 |
layout(m, respect=TRUE) |
1164 | 1182 |
|
1165 |
smoothScatter(nuc_by$fuzziness_score, -log10(nuc_by$wp_pval), col=adjustcolor(1, alpha.f=1), pch=".", ylab="BY NM2.0 -log10(wp_pval)", xlab="Danpos2 fuzziness score") |
|
1183 |
|
|
1184 |
x = nuc_by$fuzziness_score |
|
1185 |
y = -log10(nuc_by$wp_pval) |
|
1186 |
smoothScatter(x, y, col=adjustcolor(1, alpha.f=1), pch=".", ylab="BY NM2.0 -log10(wp_pval)", xlab="Danpos2 fuzziness score", nrpoints=0, colramp = Lab.palette) |
|
1187 |
k = kde2d(x,y, n=50) |
|
1188 |
contour(k, drawlabels=FALSE, nlevels=10, add=TRUE, zlim=c(0,max(k$z)/4), col=adjustcolor(1, alpha.f=0.3)) |
|
1189 |
|
|
1166 | 1190 |
plot(dx_by$y, dx_by$x, ylim=range(nuc_by$diff), xlim=rev(range(dx_by$y)), type='l', ylab="diff NM2.0 Danpos2", xlab="density", xaxt="n") |
1167 |
smoothScatter(nuc_by$fuzziness_score, nuc_by$diff, col=adjustcolor(1, alpha.f=1), pch=".", ylim=range(nuc_by$diff), ylab="diff NM2.0 Danpos2", xlab="Danpos2 fuzziness score") |
|
1168 | 1191 |
|
1169 |
smoothScatter(nuc_rm$fuzziness_score, -log10(nuc_rm$wp_pval), col=adjustcolor(1, alpha.f=1), pch=".", ylab="RM NM2.0 -log10(wp_pval)", xlab="Danpos2 fuzziness score") |
|
1192 |
x = nuc_by$fuzziness_score |
|
1193 |
y = nuc_by$diff |
|
1194 |
smoothScatter(x, y, col=adjustcolor(1, alpha.f=1), pch=".", ylim=range(nuc_by$diff), ylab="diff NM2.0 Danpos2", xlab="Danpos2 fuzziness score", nrpoints=0, colramp = Lab.palette) |
|
1195 |
k = kde2d(x,y, n=50) |
|
1196 |
contour(k, drawlabels=FALSE, nlevels=10, add=TRUE, zlim=c(0,max(k$z)/4), col=adjustcolor(1, alpha.f=0.3)) |
|
1197 |
|
|
1198 |
x = nuc_rm$fuzziness_score |
|
1199 |
y = -log10(nuc_rm$wp_pval) |
|
1200 |
smoothScatter(x, y, col=adjustcolor(1, alpha.f=1), pch=".", ylab="RM NM2.0 -log10(wp_pval)", xlab="Danpos2 fuzziness score", nrpoints=0, colramp = Lab.palette) |
|
1201 |
k = kde2d(x,y, n=50) |
|
1202 |
contour(k, drawlabels=FALSE, nlevels=10, add=TRUE, zlim=c(0,max(k$z)/4), col=adjustcolor(1, alpha.f=0.3)) |
|
1203 |
|
|
1170 | 1204 |
plot(dx_rm$y, dx_rm$x, ylim=range(nuc_rm$diff), xlim=rev(range(dx_rm$y)), type='l', ylab="diff NM2.0 Danpos2", xlab="density", xaxt="n") |
1171 |
smoothScatter(nuc_rm$fuzziness_score, nuc_rm$diff, col=adjustcolor(1, alpha.f=1), pch=".", ylim=range(nuc_rm$diff), ylab="diff NM2.0 Danpos2", xlab="Danpos2 fuzziness score") |
|
1205 |
|
|
1206 |
x = nuc_rm$fuzziness_score |
|
1207 |
y = nuc_rm$diff |
|
1208 |
smoothScatter(x, y, col=adjustcolor(1, alpha.f=1), pch=".", ylim=range(nuc_rm$diff), ylab="diff NM2.0 Danpos2", xlab="Danpos2 fuzziness score", nrpoints=0, colramp = Lab.palette) |
|
1209 |
k = kde2d(x,y, n=50) |
|
1210 |
contour(k, drawlabels=FALSE, nlevels=10, add=TRUE, zlim=c(0,max(k$z)/4), col=adjustcolor(1, alpha.f=0.3)) |
|
1172 | 1211 |
@ |
1173 | 1212 |
|
1174 | 1213 |
\newpage |
1175 | 1214 |
|
1176 | 1215 |
|
1177 | 1216 |
|
1217 |
<<echo=FALSE, results=hide, fig=TRUE, height=3, width=5, eval=TRUE >>= |
|
1218 |
hist(nuc_by$diff, breaks=-1:50) |
|
1219 |
@ |
|
1178 | 1220 |
|
1179 | 1221 |
|
1180 | 1222 |
|
... | ... | |
1427 | 1469 |
\begin{figure} |
1428 | 1470 |
\begin{center} |
1429 | 1471 |
\subfigure{ |
1430 |
<< nm_vs_dp_nucs, results=hide, fig=TRUE, height=5, width=16, echo=FALSE, eval=TRUE >>=
|
|
1472 |
<< nm_vs_dp_nucs, results=hide, fig=TRUE, height=3, width=16, echo=FALSE, eval=TRUE >>=
|
|
1431 | 1473 |
noi = by_1n[by_1n$chr == 1 & by_1n$lower_bound>168500 & by_1n$upper_bound < 170500,] |
1432 | 1474 |
plot(0,0, col=0, xlim=c(168850, 170200), ylim=c(0,2), xlab="chr I", yaxt='n', ylab="") |
1433 | 1475 |
axis(side = 2, at = c(0.5, 1.5), labels = c("NM2.0", "Dampos2")) |
1434 | 1476 |
for (i in 1:nrow(noi)) { |
1435 | 1477 |
l = noi[i,] |
1436 |
draw.ellipse(0.5*(l$upper_bound + l$lower_bound), 0.5, 0.5*(l$upper_bound - l$lower_bound), 0.4, lwd=3, col=1)
|
|
1437 |
draw.ellipse(0.5*(l$start + l$end), 1.5, 0.5*(l$start - l$end), 0.4, lwd=3, col=1)
|
|
1438 |
abline(h=c(0.5, 1.5)) |
|
1478 |
draw.ellipse(0.5*(l$upper_bound + l$lower_bound), 0.5, 0.5*(l$upper_bound - l$lower_bound), 0.4, lwd=3) |
|
1479 |
draw.ellipse(0.5*(l$start + l$end), 1.5, 0.5*(l$start - l$end), 0.4, lwd=3) |
|
1480 |
# abline(h=c(0.5, 1.5))
|
|
1439 | 1481 |
} |
1440 | 1482 |
@ |
1441 | 1483 |
} |
b/doc/sphinx_doc/conf.py | ||
---|---|---|
50 | 50 |
# built documents. |
51 | 51 |
# |
52 | 52 |
# The short X.Y version. |
53 |
version = '2.3.51'
|
|
53 |
version = '2.3.52'
|
|
54 | 54 |
# The full version, including alpha/beta/rc tags. |
55 |
release = '2.3.51'
|
|
55 |
release = '2.3.52'
|
|
56 | 56 |
|
57 | 57 |
# The language for content autogenerated by Sphinx. Refer to documentation |
58 | 58 |
# for a list of supported languages. |
b/doc/sphinx_doc/readme.rst | ||
---|---|---|
61 | 61 |
Installation |
62 | 62 |
------------ |
63 | 63 |
|
64 |
The first installation step is to retrieve the source code of MyLabStocks. You can do this by typing the following command in a terminal.
|
|
64 |
The first installation step is to retrieve the source code of NucleoMiner2. You can do this by typing the following command in a terminal.
|
|
65 | 65 |
|
66 | 66 |
.. code:: bash |
67 | 67 |
|
... | ... | |
86 | 86 |
- seqinr |
87 | 87 |
- plotrix |
88 | 88 |
- DESeq |
89 |
- cachecache https://forge.cbp.ens-lyon.fr/redmine/projects/cachecache |
|
90 |
- bot https://forge.cbp.ens-lyon.fr/redmine/projects/bot |
|
91 |
- nucleominer https://forge.cbp.ens-lyon.fr/redmine/projects/nucleominer |
|
92 | 89 |
|
93 |
The first packages could be installed by typing the following command in an R console:
|
|
90 |
These packages can be installed by typing the following command in an R console:
|
|
94 | 91 |
|
95 | 92 |
.. code:: bash |
96 | 93 |
|
... | ... | |
98 | 95 |
source("http://bioconductor.org/biocLite.R") |
99 | 96 |
biocLite("DESeq") |
100 | 97 |
|
101 |
The last packages are available in the git repository they could be install by typing the following command in your terminal: |
|
98 |
Finally,by typing the git command above, you downloaded specific R packages provided with NucleoMiner2 that you now need to install: |
|
99 |
|
|
100 |
- cachecache https://forge.cbp.ens-lyon.fr/redmine/projects/cachecache |
|
101 |
- bot https://forge.cbp.ens-lyon.fr/redmine/projects/bot |
|
102 |
- nucleominer https://forge.cbp.ens-lyon.fr/redmine/projects/nucleominer |
|
103 |
|
|
104 |
To do so, type the following command in your terminal: |
|
102 | 105 |
|
103 | 106 |
.. code:: bash |
104 | 107 |
|
b/doc/sphinx_doc/rref.rst | ||
---|---|---|
1256 | 1256 |
+---------------+---------------------------------------------------+ |
1257 | 1257 |
| Author: | Florent Chuffart | |
1258 | 1258 |
+---------------+---------------------------------------------------+ |
1259 |
| Version: | 2.3.51 |
|
|
1259 |
| Version: | 2.3.52 |
|
|
1260 | 1260 |
+---------------+---------------------------------------------------+ |
1261 | 1261 |
| License: | CeCILL | |
1262 | 1262 |
+---------------+---------------------------------------------------+ |
b/doc/sphinx_doc/tuto.rst | ||
---|---|---|
1 | 1 |
Tutorial |
2 | 2 |
======== |
3 | 3 |
|
4 |
This tutorial describes steps allowing performing quantitative analysis of epigenetic marks on individual nucleosomes. We assume that files are organised according to a given hierarchy and that all command lines are launched from the project’s root directory.
|
|
4 |
This tutorial describes steps allowing to perform quantitative analysis of epigenetic marks on individual nucleosomes. We assume that files are organised according to a given hierarchy and that all command lines are launched from the project’s root directory.
|
|
5 | 5 |
|
6 |
This tutorial is divided into two main parts. The first part covers the python script `wf.py` that aligns and converts short sequence reads. The second part covers the R scripts that extracts information (nucleosome position and indicators) from the dataset. |
|
6 |
This tutorial is divided into two main parts. The first part covers the python script `wf.py` that aligns and converts short sequence reads. The second part covers the R scripts that extracts nucleosome-level information (nucleosome position and indicators) from the dataset.
|
|
7 | 7 |
|
8 | 8 |
|
9 | 9 |
|
... | ... | |
14 | 14 |
Working Directory Organisation |
15 | 15 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
16 | 16 |
|
17 |
After having install NucleoMiner2 environment (Previous section), go to the root working directory of the tutorial by typing the following command in a terminal: |
|
17 |
After having installed NucleoMiner2 environment (Previous section), go to the root working directory of the tutorial by typing the following command in a terminal:
|
|
18 | 18 |
|
19 | 19 |
.. code:: bash |
20 | 20 |
|
... | ... | |
29 | 29 |
$$$ TODO explain how organise Experimental Dataset into the `data` directory of the working directory. |
30 | 30 |
|
31 | 31 |
|
32 |
We want to compare nucleosomes of 2 yeast strains: BY and RM. For each strain we performed Mnase-Seq and ChIP-Seq using an antibody recognizing the H3K14ac epigenetic mark.
|
|
32 |
In this tutorial, we want to compare nucleosomes of 2 yeast strains: BY and RM. For each strain Mnase-Seq was performed as well as ChIP-Seq using an antibody recognizing the H3K14ac epigenetic mark. Illumina sequencing was done in single-read of 50 bp long.
|
|
33 | 33 |
|
34 | 34 |
The dataset is composed of 55 files organised as follows: |
35 | 35 |
|
... | ... | |
62 | 62 |
Python and R Common Configuration File |
63 | 63 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
64 | 64 |
|
65 |
First of all we define in one place some configuration variables that will be launched by python and R scripts. These variables are contained in file `configurator.py`. The execution of this python script dumps variables into the `nucleominer_config.json` file that will then be used by both R and python scripts.
|
|
65 |
First, we need to define useful configuration variables that will be passed to python and R scripts. These variables are contained in file `configurator.py`. The execution of this python script dumps variables into the `nucleominer_config.json` file that will then be used by both R and python scripts.
|
|
66 | 66 |
|
67 |
To do this, go to the root directory of your project and run the following command:
|
|
67 |
The initialization of this variables is done in the configurator.py file. If you need to adapt variable values (path, default parameters...) you need to edit this file. Then, go to the root directory of your project and run the following command to dump the configuration file:
|
|
68 | 68 |
|
69 | 69 |
.. code:: bash |
70 | 70 |
|
... | ... | |
74 | 74 |
|
75 | 75 |
|
76 | 76 |
|
77 |
|
|
78 |
|
|
79 | 77 |
Preprocessing Illumina Fastq Reads for Each Sample |
80 | 78 |
-------------------------------------------------- |
81 | 79 |
|
82 |
This preprocessing step consists of 4 main steps embedded in the `wf.py` script. They are described bellow. As a preamble, this script computes `samples`, `samples_mnase` and `strains` that will be used along the 4 steps. |
|
80 |
Once variables and design have been specified, the script wf.py will automatically run all the analysis. You don't need to do anything. |
|
81 |
To run the full analysis, run the following command: |
|
82 |
|
|
83 |
.. code:: bash |
|
84 |
|
|
85 |
python src/current/wf.py |
|
86 |
|
|
87 |
The details of the steps performed by this script are explained below. |
|
88 |
This preprocessing consists of 4 steps embedded in the `wf.py` script. They are described bellow. As a preamble, this script computes `samples`, `samples_mnase` and `strains` that will be used along the 4 steps. |
|
83 | 89 |
|
84 | 90 |
|
85 | 91 |
.. autodata:: wf.samples |
... | ... | |
95 | 101 |
Creating Bowtie Index from each Reference Genome |
96 | 102 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
97 | 103 |
|
98 |
For each strain, we need to create bowtie index. Bowtie index of a strain is a tree view of the genome of this strain. It will be used by bowtie to align reads. This step is performed by the following part of the `wf.py` script:
|
|
104 |
For each strain, the script *wf.py* then creates bowtie index. Bowtie index of a strain is a tree view of the genome of this strain. It will be used by bowtie to align reads. The part of the script performing this is the following:
|
|
99 | 105 |
|
100 | 106 |
.. literalinclude:: ../../../snep/src/current/wf.py |
101 | 107 |
:start-after: # _STARTOF_ step_1 |
102 | 108 |
:end-before: # _ENDOF_ step_1 |
103 | 109 |
:language: python |
104 | 110 |
|
105 |
The following table summarizes the file sizes and process durations concerning this step.
|
|
111 |
As an indication, the following table summarizes the file sizes and process durations that we experienced when running this step on a Linux server***.
|
|
106 | 112 |
|
107 | 113 |
====== ====================== ====================== ================ |
108 | 114 |
strain fasta genome file size bowtie index file size process duration |
... | ... | |
117 | 123 |
Aligning Reads to Reference Genome |
118 | 124 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
119 | 125 |
|
120 |
Next, we launch bowtie to align reads to the reference genome. It produces a |
|
121 |
`.sam` file that we convert into a `.bed` file. Binaries for `bowtie`, `samtools` and `bedtools` are wrapped using python `subprocess` class. This step is performed by the following part of the `wf.py` script: |
|
126 |
Next, the *wf.py* script launches bowtie to align reads to the reference genome. It produces a `.sam` file that is converted into a `.bed` file. Binaries for `bowtie`, `samtools` and `bedtools` are wrapped using python `subprocess` class. This step is performed by the following part of the script: |
|
122 | 127 |
|
123 | 128 |
.. literalinclude:: ../../../snep/src/current/wf.py |
124 | 129 |
:start-after: # _STARTOF_ step_2 |
... | ... | |
128 | 133 |
Convert Aligned Reads into TemplateFilter Format |
129 | 134 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
130 | 135 |
|
131 |
TemplateFilter uses particular input formats for reads, so it is necessary to convert the `.bed` files. TemplateFilter expect reads as follows: `chr`, `coord`, `strand` and `#read` where:
|
|
136 |
TemplateFilter uses particular input formats for reads, so it is necessary to convert the `.bed` files. TemplateFilter expect reads in the following format: `chr`, `coord`, `strand` and `#read` where:
|
|
132 | 137 |
|
133 | 138 |
- `chr` is the number of the chromosome; |
134 | 139 |
- `coord` is the coordinate of the reads; |
... | ... | |
137 | 142 |
|
138 | 143 |
Each entry is *tab*-separated. |
139 | 144 |
|
140 |
**WARNING** for reverse strands, bowtie returns the position of the first nucleotide on the left hand side, whereas TemplateFilter expects the first one on the right hand side. This step takes this into account by adding the read length (in our case 50) to the reverse reads coordinates.
|
|
145 |
**WARNING** for reverse strands, bowtie returns the position of the first nucleotide on the left hand side, whereas TemplateFilter expects the first one on the right hand side. This is taken into account in NucleoMiner2 by adding the read length (in our case 50) to the reverse reads coordinates.
|
|
141 | 146 |
|
142 |
This step is performed by the following part of the `wf.py` script:
|
|
147 |
This step is performed by the following part of the *wf.py* script:
|
|
143 | 148 |
|
144 | 149 |
.. literalinclude:: ../../../snep/src/current/wf.py |
145 | 150 |
:start-after: # _STARTOF_ step_3 |
146 | 151 |
:end-before: # _ENDOF_ step_3 |
147 | 152 |
:language: python |
148 | 153 |
|
149 |
The following table summarises the number of reads, the involved file sizes and process durations concerning the two last steps. In our case, alignment process have been multithreaded over 3 cores.
|
|
154 |
The following table summarizes the number of reads, the involved file sizes and process durations that we experienced when running the two last steps. In our case, alignment process were multithreaded over 3 cores.
|
|
150 | 155 |
|
151 | 156 |
== ============== ========================= ====== ================ ================== ================ |
152 | 157 |
id Illumina reads aligned and filtred reads ratio `.bed` file size TF input file size process duration |
... | ... | |
170 | 175 |
Finally, for each sample we perform TemplateFilter analysis. |
171 | 176 |
|
172 | 177 |
**WARNING** TemplateFilter returns a list of nucleosomes. Each nucleosome is |
173 |
define by its center and its width. An odd width leads us to consider non- |
|
178 |
defined by its center and its width. An odd width leads us to consider non-
|
|
174 | 179 |
integer lower and upper bound. |
175 | 180 |
|
176 |
**WARNING** TemplateFilter is not designed to deal with replicates. So we recommend to keep a maximum of nucleosomes and filter the aberrant ones afterwards using the benefits of having replicates. To do this, we set a low correlation threshold parameter (0.5) and a particularly high value of overlap (300%).
|
|
181 |
**WARNING** TemplateFilter was not designed to handle replicates. So we recommend to keep a maximum of nucleosomes and filter the aberrant ones afterwards using the benefits of having replicates. To do this, we set a low correlation threshold parameter (0.5) and a particularly high value of overlap (300%).
|
|
177 | 182 |
|
178 | 183 |
This step is performed by the following part of the `wf.py` script: |
179 | 184 |
|
... | ... | |
367 | 372 |
|
368 | 373 |
|
369 | 374 |
|
370 |
The second part of the tutorial uses R (http://http://www.r-project.org). It consists of a set of R scripts that will be sourced in an R from a console launched at the root of your project. These scripts are:
|
|
375 |
The second part of the tutorial uses R (http://http://www.r-project.org). NucleoMiner2 contains a set of R scripts that will be sourced in R from a console launched at the root of your project. These scripts are:
|
|
371 | 376 |
|
372 | 377 |
- headers.R |
373 | 378 |
- extract_maps.R |
... | ... | |
380 | 385 |
The Script headers.R |
381 | 386 |
^^^^^^^^^^^^^^^^^^^^ |
382 | 387 |
|
383 |
The script headers.R is included in each other scripts. It is in charge of:
|
|
388 |
The script headers.R is included in all other R scripts. It is in charge of:
|
|
384 | 389 |
|
385 | 390 |
- launching libraries used in the scripts |
386 | 391 |
- launching configuration (design, strain, marker...) |
387 |
- computing and caching CURs (caching means storing the information in the computer's memory) |
|
388 |
|
|
389 |
Note that you can customize the function “translate”. This function allows you to use the alignments between genomes when performing various tasks. You may be using NucleoMiner2 to analyse data of a single strain, or of several strains. |
|
390 |
|
|
391 |
- All the data corresponds to the same strain (e.g. treatment/control, or only few mutations): Then in step 1), the regions to use are entire chromosomes. Instep 2) simply use the default translate function which is neutral. |
|
392 |
- computing and caching Common Uinterrupted Regions (CURs). Caching means storing the information in the computer's memory. |
|
392 | 393 |
|
393 |
- The data come from two or more strains: In this case, edit a list of regions and customize the translate function which performs the correspondence between the different genomes. How we did it: a .c2c file is obtained with NucleoMiner 1.0 (refer to the Appendice "Generate .c2c Files"), then use it to produce the list of regions and customise “translate”.
|
|
394 |
Note that you can customize the function “translate”. This function allows you to use the alignments between genomes when performing various tasks.
|
|
394 | 395 |
|
396 |
- You may want to analyze data of a single strain (e.g. treatment/control, or only few mutations). In this case, the genome is identical across all samples and you do not need to define particular CURs (CURs are chromosomes). Simply use the default translate function which is neutral. |
|
395 | 397 |
|
398 |
- If you are analyzing data from two or more strains (as NucleoMiner2 was designed for), then you need to translate coordinates of one genome into the coordinates of another one. You must do this by aligning the two genomes, which will produce a .c2c file (see Appendice "Generate .c2c Files"). thenuse it to produce the list of regions and customise “translate”. |
|
396 | 399 |
|
397 |
|
|
398 |
In your R console, run the following command line: |
|
400 |
In our tutorial, we are in the second case and to perform all these steps run the following command line in your R console: |
|
399 | 401 |
|
400 | 402 |
.. code:: bash |
401 | 403 |
|
... | ... | |
404 | 406 |
|
405 | 407 |
The Script extract_maps.R |
406 | 408 |
^^^^^^^^^^^^^^^^^^^^^^^^^ |
407 |
This script is in charge of extracting Maps for well-positioned and fuzzy nucleosomes. First of all, this script computes intra and inter-strain nucleosome maps for each CUR. This step is executed in parallel on many cores using the BoT library. Next, it collects results and produces well-positioned, fuzzy and UNR maps.
|
|
409 |
This script is in charge of extracting Maps for well-positioned and sensitive nucleosomes. First of all, this script computes intra and inter-strain matches of nucleosome maps for each CUR. This step can be executed in parallel on many cores using the BoT library. Next, it collects results and produces maps of well-positioned nucleosomes, sensitive nucleosomes and Unaligned Nucleosomal Regions .
|
|
408 | 410 |
|
409 |
The well-positioned map for BY is collected in the result directory and is called `BY_wp.tab`. It is composed of following columns:
|
|
411 |
The map of well-positioned nucleosomes for BY is collected in the result directory and is called `BY_wp.tab`. It is composed of following columns:
|
|
410 | 412 |
|
411 | 413 |
- chr, the number of the chromosome |
412 | 414 |
- lower_bound, the lower bound of the nucleosome |
... | ... | |
419 | 421 |
- llr_1, for a well-positioned nucleosome, it is the LLR1 (log-likelihood ratio) between the first and the second TemplateFilter nucleosome on the chain. |
420 | 422 |
- llr_2, for a well-positioned nucleosome, it is the LLR1 between the second and the third TemplateFilter nucleosome on the chain. |
421 | 423 |
- wp_llr, for a well-positioned nucleosome, it is the LLR2 that compares consistency of the positioning over all TemplateFilter nucleosomes. |
422 |
- wp_pval, for a well-positioned nucleosome, it is the p-value chi square test obtained with the LLR2 (`1-pchisq(2.LLR2, df=4)`)
|
|
424 |
- wp_pval, for a well-positioned nucleosome, it is the p-value chi square test obtained from LLR2 (`1-pchisq(2.LLR2, df=4)`)
|
|
423 | 425 |
- dyad_shift, for a well-positioned nucleosome, it is the shift between the two extreme TemplateFilter nucleosome dyad positions. |
424 | 426 |
|
425 |
The fuzzy map for BY is collected in the result directory and is called `BY_fuzzy.tab`. It is composed of following columns:
|
|
427 |
The sensitive map for BY is collected in the result directory and is called `BY_fuzzy.tab`. It is composed of following columns:
|
|
426 | 428 |
|
427 | 429 |
- chr, the number of the chromosome |
428 | 430 |
- lower_bound, the lower bound of the nucleosome |
... | ... | |
436 | 438 |
- index_nuc_RM, the index of the RM nucleosome in the CUR |
437 | 439 |
- llr_score, , the LLR3 score that estimates conservation between the positions in BY and RM |
438 | 440 |
- common_wp_pval, the p-value chi square test obtained from LLR3 (`1-pchisq(2.LLR3, df=2)`) |
439 |
- diff, the dyads shift between the positions in the two strains |
|
441 |
- diff, the dyads shift between the positions in the two strains (in bp)
|
|
440 | 442 |
|
441 | 443 |
The common UNR map for BY and RM strains is collected in the result directory and is called `BY_RM_common_unr.tab`. It is composed of the following columns: |
442 | 444 |
|
... | ... | |
454 | 456 |
The Script translate_common_wp.R |
455 | 457 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
456 | 458 |
|
457 |
This script is used to translate common well-positioned nucleosome maps from a strain to another strain and stores it into a table.
|
|
459 |
This script is used to translate common well-positioned nucleosome positions from a strain to another strain and stores it into a table.
|
|
458 | 460 |
|
459 |
For example, the file `results/2014-04/RM_wp_tr_2_BY.tab` contains RM well-positioned nucleosome translated into the BY genome coordinates. It is composed of following columns: |
|
461 |
For example, the file `results/2014-04/RM_wp_tr_2_BY.tab` contains RM well-positioned nucleosomes translated into the BY genome coordinates. It is composed of following columns:
|
|
460 | 462 |
|
461 | 463 |
- strain_ref, the reference genome (in which positioned are defined) |
462 | 464 |
- begin, the translated lower bound of the nucleosome |
... | ... | |
476 | 478 |
The Script split_samples.R |
477 | 479 |
^^^^^^^^^^^^^^^^^^^^^^^^^^ |
478 | 480 |
|
479 |
For memory space usage reasons, we split and compress TemplateFilter input files according to their corresponding chromosome. for example, `sample_1_TF.tab` will be split into :
|
|
481 |
To optimize memory space usage, we split and compress TemplateFilter input files according to their corresponding chromosome. for example, `sample_1_TF.tab` will be split into :
|
|
480 | 482 |
|
481 | 483 |
- sample_1_chr_1_splited_sample.tab.gz |
482 | 484 |
- sample_1_chr_2_splited_sample.tab.gz |
... | ... | |
547 | 549 |
|
548 | 550 |
sample_id are given in file samples.csv |
549 | 551 |
|
550 |
If you don't know which column to use, we recommend using wpunr. |
|
552 |
If you don't know which column to use for normalization, we recommend using wpunr.
|
|
551 | 553 |
|
552 |
If you want the very detailed factors produced by DESeq, here are the information:
|
|
554 |
Here are the details of the factors produced:
|
|
553 | 555 |
|
554 | 556 |
- unr: factor computed from data of UNR regions. These regions are defined for every pairs of aligned genomes (e.g. BY_RM) |
555 | 557 |
- wp: same, but for well-positioned nucleosomes. |
... | ... | |
604 | 606 |
The 5 last columns concern DESeq analysis: |
605 | 607 |
|
606 | 608 |
- manip[a_manip] strain[a_strain] manip[a_strain]:strain[a_strain], the manip (marker) effect, the strain effect and the snep effect. These are the coefficients of the fitted generalized linear model. |
607 |
- pvalsGLM, the pvalue resulting of the comparison of the GLM model considering or not the interaction term marker:strain. This is the statsitcial significance of the interaction term and therefore the statistical significance of the SNEP.
|
|
609 |
- pvalsGLM, the pvalue resulting from the comparison of the GLM model considering the interaction term *marker:strain* to the GLM model that does not consider it. This is the statsitcial significance of the interaction term and therefore the statistical significance of the SNEP.
|
|
608 | 610 |
- snep_index, a boolean set to TRUE if the pvalueGLM value is under the threshold computed with FDR function with a rate set to 0.0001. |
609 |
To execute this script, run the following command |
|
610 | 611 |
|
611 | 612 |
To execute this script, run the following command in your R console: |
612 | 613 |
|
... | ... | |
636 | 637 |
APPENDICE: Generate .c2c Files |
637 | 638 |
------------------------------ |
638 | 639 |
|
640 |
The `.c2c` files is a simple table that describes how two genome |
|
641 |
sequences are aligned. This file can be generated by using scripts that were developed in NucleoMiner 1.0 (Nagarajan et al. PLoS Genetics 2010) and which we provide in this release of NucleoMiner2. |
|
639 | 642 |
|
640 |
The `.c2c` files is a simple table that describes how the genome sequence can be aligned. We generate it using some NucleoMiner 1.0 scripts. |
|
641 | 643 |
|
642 |
To use NucleoMiner 1.0 scripts on your UNIX/LINUX computer you need first to install MUMmer which is a system for rapidly aligning entire genomes, whether in complete or draft form.
|
|
644 |
To use these scripts on your UNIX/LINUX computer you need first to install MUMmer which is designed to rapidly align entire genomes, whether in complete or draft form.
|
|
643 | 645 |
|
644 |
Installing the MUMmer library
|
|
645 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
646 |
Installing MUMmer
|
|
647 |
^^^^^^^^^^^^^^^^^ |
|
646 | 648 |
|
647 |
Get the last version of MUMmer archive on your computer (MUMmer3.23.tar.gz distributed in the directory deps of your working directory). Copy it in a dedicated directory. Install it locally into the src folder of you working directory by typing (working directory):
|
|
649 |
Get the last version of MUMmer archive on your computer (MUMmer3.23.tar.gz is provided in the directory deps of your working directory). Copy it in a dedicated directory. Install it locally into the src folder of you working directory by typing (working directory):
|
|
648 | 650 |
|
649 |
tar -xvzf gdl-1.0.tar.gz
|
|
651 |
tar -xvzf MUMmer3.23.tar.gz
|
|
650 | 652 |
|
651 |
This creates a directory called gdl-1.0. You now need to go into this directory and compile the library, by typing: |
|
652 | 653 |
|
653 | 654 |
.. code:: bash |
654 | 655 |
|
... | ... | |
661 | 662 |
Installing NucleoMiner 1.0 scripts |
662 | 663 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
663 | 664 |
|
664 |
Get the nucleominer-1.0.tar.gz archive on your computer (distributed in the directory deps of your working directory). Install it locally into the src folder of you working directory by typing (working directory):
|
|
665 |
Get the nucleominer-1.0.tar.gz archive on your computer (this archive is provided in the directory deps of your working directory). Install it locally into the src folder of you working directory by typing (working directory):
|
|
665 | 666 |
|
666 | 667 |
|
667 | 668 |
.. code:: bash |
... | ... | |
670 | 671 |
tar xfvz ../deps/nucleominer-1.0.tar.gz |
671 | 672 |
cd .. |
672 | 673 |
|
673 |
This creates a directory called that contains NucleoMiner 1.0 scripts (src/nucleominer-1.0/scripts).
|
|
674 |
This creates a directory that contains NucleoMiner 1.0 scripts (src/nucleominer-1.0/scripts). |
|
674 | 675 |
|
675 | 676 |
|
676 | 677 |
Generate .c2c Files |
b/src/DESCRIPTION | ||
---|---|---|
1 | 1 |
Package: nucleominer |
2 | 2 |
Maintainer: Florent Chuffart <florent.chuffart@ens-lyon.fr> |
3 | 3 |
Author: Florent Chuffart |
4 |
Version: 2.3.51
|
|
4 |
Version: 2.3.52
|
|
5 | 5 |
License: CeCILL |
6 | 6 |
Title: nm |
7 | 7 |
Depends: seqinr, plotrix, DESeq, cachecache, dplyr |
Formats disponibles : Unified diff