Aperçu


Welcome to *NucleoMiner2*
*************************

* Readme / Documentation for *NucleoMiner2*

  * License

  * Installation Instructions

* Tutorial

  * Experimental Dataset, Working Directory and Configuration File

  * Preprocessing Illumina Fastq Reads for Each Sample

  * Inferring Nucleosome Position and Extracting Read Counts

  * Results: Number of SNEPs

  * APPENDICE: Generate .c2c Files

* References

  * Python Reference

  * R Reference

Readme / Documentation for *NucleoMiner2*
*****************************************

*NucleoMiner2* offers Python API and R package allowing to perform
quantitative analysis of epigenetic marks on individual nucleosomes.
It was developed to detect natural Single-Nucleosome Epi-Polymorphisms
(SNEP) from MNase-seq and ChIP-seq data.

License
=======

Copyright CNRS 2012-2013

* Florent CHUFFART

* Jean-Baptiste VEYRIERAS

* Gael YVERT

This software is a computer program which purpose is to perform
quanti- tative analysis of epigenetic marks at single nucleosome
resolution.

This software is governed by the CeCILL license under French law and
abiding by the rules of distribution of free software.  You can  use,
modify and/ or redistribute the software under the terms of the CeCILL
license as circulated by CEA, CNRS and INRIA at the following URL
"http://www.cecill.info".

As a counterpart to the access to the source code and  rights to copy,
modify and redistribute granted by the license, users are provided
only with a limited warranty  and the software's author,  the holder
of the economic rights,  and the successive licensors  have only
limited liability.

This software is provided with absolutely NO WARRANTY. The authors can
not be held responsible, even partially, for any damage, loss,
financial loss or any other undesired facts resulting from the use of
the software. In this respect, the user's attention is drawn to the
risks associated with loading,  using,  modifying and/or developing or
reproducing the software by the user in light of its specific status
of free software, that may mean  that it is complicated to manipulate,
and  that  also therefore means  that it is reserved for developers
and  experienced professionals having in-depth computer knowledge.
Users are therefore encouraged to load and test the software's
suitability as regards their requirements in conditions enabling the
security of their systems and/or data to be ensured and,  more
generally, to use and operate it in the same conditions as regards
security.

The fact that you are presently reading this means that you have had
knowledge of the CeCILL license and that you accept its terms.

Installation Instructions
=========================

Links
-----

*NucleoMiner2* home page and documentation are available here:

   * https://forge.cbp.ens-lyon.fr/redmine/projects/nucleominer

The Yvert lab web page is accessible here:

   * http://www.ens-lyon.fr/LBMC/gisv/

Installation
------------

The first installation step is to retrieve the source code of
NucleoMiner2. You can do this by typing the following command in a
terminal.

   git clone http://forge.cbp.ens-lyon.fr/git/nucleominer

Prerequisites
~~~~~~~~~~~~~

To work properly, NucleoMiner2 needs that the following free software
are installed and made available on your system:

   * Bowtie2 http://bowtie-bio.sourceforge.net/bowtie2

   * SAMtools http://samtools.sourceforge.net

   * bedtools http://code.google.com/p/bedtools/

   * TemplateFilter
     http://compbio.cs.huji.ac.il/NucPosition/TemplateFiltering

It also requires the following R packages to be installed on your
system:

   * fork

   * rjson

   * seqinr

   * plotrix

   * DESeq

These packages can be installed by typing the following command in an
R console:

   install.packages(c("fork", "rjson", "seqinr", "plotrix"))
   source("http://bioconductor.org/biocLite.R")
   biocLite("DESeq")

Finally,by typing the git command above, you downloaded specific R
packages provided with NucleoMiner2 that you now need to install:

   * cachecache https://forge.cbp.ens-
     lyon.fr/redmine/projects/cachecache

   * bot https://forge.cbp.ens-lyon.fr/redmine/projects/bot

   * nucleominer https://forge.cbp.ens-
     lyon.fr/redmine/projects/nucleominer

To do so, type the following command in your terminal:

   cd nucleominer
   R CMD INSTALL doc/Chuffart_NM2_workdir/deps/bot_0.14.tar.gz\
       doc/Chuffart_NM2_workdir/deps/cachecache_0.1.tar.gz\
       build/nucleominer_2.3.46.tar.gz

Tutorial
********

This tutorial describes steps allowing to perform quantitative
analysis of epigenetic marks on individual nucleosomes. We assume that
files are organised according to a given hierarchy and that all
command lines are launched from the project’s root directory.

This tutorial is divided into two main parts. The first part covers
the python script *wf.py* that aligns and converts short sequence
reads. The second part covers the R scripts that extracts nucleosome-
level information (nucleosome position and indicators) from the
dataset.

Experimental Dataset, Working Directory and Configuration File
==============================================================

Working Directory Organisation
------------------------------

After having installed NucleoMiner2 environment (Previous section), go
to the root working directory of the tutorial by typing the following
command in a terminal:

   cd doc/Chuffart_NM2_workdir/

Retrieving Experimental Dataset
-------------------------------

The MNase-seq and MN-ChIP-seq raw data are available at ArrayExpress
(http://www.ebi.ac.uk/arrayexpress/) under accession number
E-MTAB-2671.

$$$ TODO explain how organise Experimental Dataset into the *data*
directory of the working directory.

In this tutorial, we want to compare nucleosomes of 2 yeast strains:
BY and RM. For each strain Mnase-Seq was performed as well as ChIP-Seq
using an antibody recognizing the H3K14ac epigenetic mark. Illumina
sequencing was done in single-read of 50 bp long.

The dataset is composed of 55 files organised as follows:

   * 3 replicates for BY MNase Seq

     * sample 1 (5 fastq.gz files)

     * sample 2 (5 fastq.gz files)

     * sample 3 (4 fastq.gz files)

   * 3 replicates for RM MNase Seq

     * sample 4 (4 fastq.gz files)

     * sample 5 (4 fastq.gz files)

     * sample 6 (5 fastq.gz files)

   * 3 replicates for BY ChIP Seq H3K14ac

     * sample 36 (5 fastq.gz files)

     * sample 37 (5 fastq.gz files)

     * sample 53 (9 fastq.gz files)

   * 2 replicates for RM ChIP Seq H3K14ac

     * sample 38 (5 fastq.gz files)

     * sample 39 (4 fastq.gz files)

Python and R Common Configuration File
--------------------------------------

First, we need to define useful configuration variables that will be
passed to python and R scripts. These variables are contained in file
*configurator.py*. The execution of this python script dumps variables
into the *nucleominer_config.json* file that will then be used by both
R and python scripts.

The initialization of this variables is done in the configurator.py
file. If you need to adapt variable values (path, default
parameters...) you need to edit this file. Then, go to the root
directory of your project and run the following command to dump the
configuration file:

   python src/current/configurator.py

Preprocessing Illumina Fastq Reads for Each Sample
==================================================

Once variables and design have been specified, the script wf.py will
automatically run all the analysis. You don't need to do anything. To
run the full analysis, run the following command:

   python src/current/wf.py

The details of the steps performed by this script are explained below.
This preprocessing consists of 4 steps embedded in the *wf.py* script.
They are described bellow.  As a preamble, this script computes
*samples*, *samples_mnase* and *strains* that will be used along the 4
steps.

wf.samples = []

   List of samples where a sample is identified by an id (key: *id*)
   and a strain name (key *strain*).

wf.samples_mnase = []

   List of Mnase samples.

wf.strains = []

   List of reference strains.

Creating Bowtie Index from each Reference Genome
------------------------------------------------

For each strain, the script *wf.py* then creates bowtie index. Bowtie
index of a strain is a tree view of the genome of this strain. It will
be used by bowtie to align reads. The part of the script performing
this is the following:

     for strain in strains:
       per_strain_stats[strain] = create_bowtie_index(strain, 
         config["FASTA_REFERENCE_GENOME_FILES"][strain], config["INDEX_DIR"], 
         config["BOWTIE_BUILD_BIN"])

As an indication, the following table summarizes the file sizes and
process durations that we experienced when running this step on a
Linux server***.

+--------+------------------------+------------------------+------------------+
| strain | fasta genome file size | bowtie index file size | process duration |
+========+========================+========================+==================+
| BY     | 12 Mo                  | 25 Mo                  | 11 s.            |
+--------+------------------------+------------------------+------------------+
| RM     | 12 Mo                  | 24 Mo                  | 9 s.             |
+--------+------------------------+------------------------+------------------+

Aligning Reads to Reference Genome
----------------------------------

Next, the *wf.py* script launches bowtie to align reads to the
reference genome. It produces a *.sam* file that is converted into a
*.bed* file. Binaries for *bowtie*, *samtools* and *bedtools* are
wrapped using python *subprocess* class. This step is performed by the
following part of the script:

     for sample in samples:
       per_sample_align_stats["sample_%s" % sample["id"]] = align_reads(sample, 
         config["ALIGN_DIR"], config["LOG_DIR"], config["INDEX_DIR"], 
         config["ILLUMINA_OUTPUTFILE_PREFIX"], config["BOWTIE2_BIN"], 
         config["SAMTOOLS_BIN"], config["BEDTOOLS_BIN"])

Convert Aligned Reads into TemplateFilter Format
------------------------------------------------

TemplateFilter uses particular input formats for reads, so it is
necessary to convert the *.bed* files. TemplateFilter expect reads in
the following format: *chr*, *coord*, *strand* and *#read* where:

* *chr* is the number of the chromosome;

* *coord* is the coordinate of the reads;

* *strand* is *F* for forward and *R* for reverse;

* *#reads* the number of reads covering this position.

Each entry is *tab*-separated.

**WARNING** for reverse strands, bowtie returns the position of the
first nucleotide on the left hand side, whereas TemplateFilter expects
the first one on the right hand side.  This is taken into account in
NucleoMiner2 by adding the read length (in our case 50) to the reverse
reads coordinates.

This step is performed by the following part of the *wf.py* script:

     for sample in samples:
       per_sample_convert_stats["sample_%s" % sample["id"]] = split_fr_4_TF(sample, 
         config["ALIGN_DIR"], config["FASTA_INDEXES"], config["AREA_BLACK_LIST"], 
         config["READ_LENGTH"],config["MAPQ_THRES"])

The following table summarizes the number of reads, the involved file
sizes and process durations that we experienced when running the two
last steps. In our case, alignment process were multithreaded over 3
cores.

+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| id | Illumina reads | aligned and filtred reads | ratio  | *.bed* file size | TF input file size | process duration |
+====+================+===========================+========+==================+====================+==================+
| 1  | 16436138       | 10199695                  | 62,06% | 1064 Mo          | 60  Mo             | 383   s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| 2  | 16911132       | 12512727                  | 73,99% | 1298 Mo          | 64  Mo             | 437   s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| 3  | 15946902       | 12340426                  | 77,38% | 1280 Mo          | 65  Mo             | 423   s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| 4  | 13765584       | 10381903                  | 75,42% | 931  Mo          | 59  Mo             | 352   s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| 5  | 15168268       | 11502855                  | 75,83% | 1031 Mo          | 64  Mo             | 386   s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| 6  | 18850820       | 14024905                  | 74,40% | 1254 Mo          | 69  Mo             | 482   s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| 36 | 17715118       | 14092985                  | 79,55% | 1404 Mo          | 68  Mo             | 483   s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| 37 | 17288466       | 7402082                   | 42,82% | 741  Mo          | 48  Mo             | 339   s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| 38 | 16116394       | 13178457                  | 81,77% | 1101 Mo          | 63  Mo             | 420   s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| 39 | 14241106       | 10537228                  | 73,99% | 880  Mo          | 57  Mo             | 348   s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+
| 53 | 40876476       | 33780065                  | 82,64% | 3316 Mo          | 103 Mo             | 1165  s.         |
+----+----------------+---------------------------+--------+------------------+--------------------+------------------+

Run TemplateFilter on Mnase Samples
-----------------------------------

Finally, for each sample we perform TemplateFilter analysis.

**WARNING** TemplateFilter returns a list of nucleosomes. Each
nucleosome is defined by its center and its width. An odd width leads
us to consider non- integer lower and upper bound.

**WARNING** TemplateFilter was not designed to handle replicates. So
we recommend to keep a maximum of nucleosomes and filter the aberrant
ones afterwards using the benefits of having replicates. To do this,
we set a low correlation threshold parameter (0.5) and a particularly
high value of overlap (300%).

This step is performed by the following part of the *wf.py* script:

     for sample in samples_mnase:
       per_mnase_sample_stats["sample_%s" % sample["id"]] = template_filter(sample, 
         config["ALIGN_DIR"], config["LOG_DIR"], config["TF_BIN"], 
         config["TF_TEMPLATES_FILE"], config["TF_CORR"], config["TF_MINW"], 
         config["TF_MAXW"], config["TF_OL"])  

+----+--------+------------+---------------+------------------+
| id | strain | found nucs | nuc file size | process duration |
+====+========+============+===============+==================+
| 1  | BY     | 96214      | 68 Mo         | 1022 s.          |
+----+--------+------------+---------------+------------------+
| 2  | BY     | 91694      | 65 Mo         | 1038 s.          |
+----+--------+------------+---------------+------------------+
| 3  | BY     | 91205      | 65 Mo         | 1036 s.          |
+----+--------+------------+---------------+------------------+
| 4  | RM     | 88076      | 62 Mo         | 984 s.           |
+----+--------+------------+---------------+------------------+
| 5  | RM     | 90141      | 64 Mo         | 967 s.           |
+----+--------+------------+---------------+------------------+
| 6  | RM     | 87517      | 62 Mo         | 980 s.           |
+----+--------+------------+---------------+------------------+

Inferring Nucleosome Position and Extracting Read Counts
========================================================

The second part of the tutorial uses R
(http://http://www.r-project.org). NucleoMiner2 contains a set of R
scripts that will be sourced in R from a console launched at the root
of your project. These scripts are:

   * headers.R

   * extract_maps.R

   * translate_common_wp.R

   * split_samples.R

   * count_reads.R

   * get_size_factors

   * launch_deseq.R

The Script headers.R
--------------------

The script headers.R is included in all other R scripts. It is in
charge of:

   * launching libraries used in the scripts

   * launching configuration (design, strain, marker...)

   * computing and caching Common Uinterrupted Regions (CURs).
     Caching means storing the information in the computer's memory.

Note that you can customize the function “translate”. This function
allows you to use the alignments between genomes when performing
various tasks.

   * You may want to analyze data of a single strain (e.g.
     treatment/control, or only few mutations). In this case, the
     genome is identical across all samples and you do not need to
     define particular CURs (CURs are chromosomes). Simply use the
     default translate function which is neutral.

   * If you are analyzing data from two or more strains (as
     NucleoMiner2 was designed for), then you need to translate
     coordinates of one genome into the coordinates of another one.
     You must do this by aligning the two genomes, which will produce
     a .c2c file (see Appendice "Generate .c2c Files").  thenuse it to
     produce the list of regions and customise “translate”.

In our tutorial, we are in the second case and to perform all these
steps run the following command line in your R console:

   source("src/current/headers.R")

The Script extract_maps.R
-------------------------

This script is in charge of extracting Maps for well-positioned and
sensitive nucleosomes. First of all, this script computes intra and
inter-strain matches of nucleosome maps for each CUR. This step can be
executed in parallel on many cores using the BoT library. Next, it
collects results and produces maps of  well-positioned nucleosomes,
sensitive nucleosomes and Unaligned Nucleosomal Regions .

The map of well-positioned nucleosomes for BY is collected in the
result directory and is called *BY_wp.tab*. It is composed of
following columns:

   * chr, the number of the chromosome

   * lower_bound, the lower bound of the nucleosome

   * upper_bound, the upper bound of the nucleosome

   * cur_index, index of the CUR

   * index_nuc, the index of the nucleosome in the CUR

   * wp, 1 if it is a well positioned nucleosome, 0 otherwise

   * nb_reads, the number of reads that support this nucleosome

   * nb_nucs, the number of TemplateFilter nucleosome across
     replicates (= the number of replicates in which it is a well-
     positioned nucleosome)

   * llr_1, for a well-positioned nucleosome, it is the LLR1 (log-
     likelihood ratio) between the first and the second TemplateFilter
     nucleosome on the chain.

   * llr_2, for a well-positioned nucleosome, it is the LLR1 between
     the second and the third TemplateFilter nucleosome on the chain.

   * wp_llr, for a well-positioned nucleosome, it is the LLR2 that
     compares consistency of the positioning over all TemplateFilter
     nucleosomes.

   * wp_pval, for a well-positioned nucleosome, it is the p-value
     chi square test obtained from LLR2 (*1-pchisq(2.LLR2, df=4)*)

   * dyad_shift, for a well-positioned nucleosome, it is the shift
     between the two extreme TemplateFilter nucleosome dyad positions.

The sensitive map for BY is collected in the result directory and is
called *BY_fuzzy.tab*. It is composed of following columns:

   * chr, the number of the chromosome

   * lower_bound, the lower bound of the nucleosome

   * upper_bound, the upper bound of the nucleosome

   * cur_index, index of the CUR

The map of common well-positioned nucleosomes aligned between the BY
and RM strains is collected in the result directory and is called
*BY_RM_common_wp.tab*. It is composed of following columns:

   * cur_index, the index of the CUR

   * index_nuc_BY, the index of the BY nucleosome in the CUR

   * index_nuc_RM, the index of the RM nucleosome in the CUR

   * llr_score, , the LLR3 score that estimates conservation between
     the positions in BY and RM

   * common_wp_pval,  the p-value chi square test obtained from LLR3
     (*1-pchisq(2.LLR3, df=2)*)

   * diff, the dyads shift between the positions in the two strains
     (in bp)

The common UNR map for BY and RM strains is collected in the result
directory and is called *BY_RM_common_unr.tab*. It is composed of the
following columns:

   * cur_index, the index of the CUR

   * index_nuc_BY, the index of the BY nucleosome in the CUR

   * index_nuc_RM,the index of the RM nucleosome in the CUR

To execute this script, run the following command in your R console:

   source("src/current/extract_maps.R")

The Script translate_common_wp.R
--------------------------------

This script is used to translate common well-positioned nucleosome
positions from a strain to another strain and stores it into a table.

For example, the file *results/2014-04/RM_wp_tr_2_BY.tab* contains RM
well-positioned nucleosomes translated into the BY genome coordinates.
It is composed of following columns:

   * strain_ref, the reference genome (in which positioned are
     defined)

   * begin, the translated lower bound of the nucleosome

   * end, the translated upper bound of the nucleosome

   * chr, the number of chromosomes for the reference genome (in
     which positioned are defined)

   * length, the length of the nucleosome (could be negative)

   * cur_index, the index of the CUR

   * index_nuc, the index of the nucleosome in the CUR

To execute this script, run the following command in your R console:

   source("src/current/translate_common_wp.R")

The Script split_samples.R
--------------------------

To optimize memory space usage, we split and compress TemplateFilter
input files according to their corresponding  chromosome. for example,
*sample_1_TF.tab* will be split into :

   * sample_1_chr_1_splited_sample.tab.gz

   * sample_1_chr_2_splited_sample.tab.gz

   * ...

   * sample_1_chr_17_splited_sample.tab.gz

To execute this script, run the following command in your R console:

   source("src/current/split_samples.R")

The Script count_reads.R
------------------------

To associate a number of observations (read) to each nucleosome we run
the script *count_reads.R*. It produces the files
*BY_RM_H3K14ac_wp_and_nbreads.tab*,
*BY_RM_H3K14ac_unr_and_nbreads.tab*
*BY_RM_Mnase_Seq_wp_and_nbreads.tab* and
*BY_RM_Mnase_Seq_unr_and_nbreads.tab* for H3K14ac common well-
positioned nucleosomes, H3K14ac UNRs, Mnase common well-positioned
nucleosomes and Mnase UNRs respectively.

For example, the file *BY_RM_H3K14ac_unr_and_nbreads.tab* contains
counted reads for well-positioned nucleosomes with the experimental
condition ChIP H3K14ac. It is composed of the following columns:

   * chr_BY, the number of the chromosome for BY

   * lower_bound_BY, the lower bound of the nucleosome for BY

   * upper_bound_BY, the upper bound of the nucleosome  for BY

   * index_nuc_BY, the index of the BY nucleosome in the CUR for BY

   * chr_RM, the number of the chromosome for RM

   * lower_bound_RM, the lower bound of the nucleosome for RM

   * upper_bound_RM, the upper bound of the nucleosome  for RM

   * index_nuc_RM,the index of the RM nucleosome in the CUR for RM

   * cur_index, index of the CUR

   * BY_H3K14ac_36, the number of reads for the current nucleosome
     for the sample 36

   * BY_H3K14ac_37, #reads for sample 37

   * BY_H3K14ac_53, #reads for sample 53

   * RM_H3K14ac_38, #reads for sample 38

   * RM_H3K14ac_39, #reads for sample 39

To execute this script, run the following command in your R console:

   source("src/current/count_reads.R")

The Script get_size_factors.R
-----------------------------

This script uses the DESeq function *estimateSizeFactors* to compute
the size factor of each sample. It corresponds to normalisation of
read counts from sample to sample, as determined by DESeq. When a
sample has n reads for a nucleosome or a UNR, the normalised count is
n/f where f is the factor contained in this file. The script dumps
computed size factors into the file *size_factors.tab*. This file has
the form:

+-----------+---------+---------+---------+
| sample_id | wp      | unr     | wpunr   |
+===========+=========+=========+=========+
| 1         | 0.87396 | 0.88097 | 0.87584 |
+-----------+---------+---------+---------+
| 2         | 1.07890 | 1.07440 | 1.07760 |
+-----------+---------+---------+---------+
| 3         | 1.06400 | 1.05890 | 1.06250 |
+-----------+---------+---------+---------+
| 4         | 0.85782 | 0.87948 | 0.86305 |
+-----------+---------+---------+---------+
| 5         | 0.97577 | 0.96590 | 0.97307 |
+-----------+---------+---------+---------+
| 6         | 1.19630 | 1.18120 | 1.19190 |
+-----------+---------+---------+---------+
| 36        | 0.93318 | 0.92762 | 0.93166 |
+-----------+---------+---------+---------+
| 37        | 0.48315 | 0.48453 | 0.48350 |
+-----------+---------+---------+---------+
| 38        | 1.11240 | 1.11210 | 1.11230 |
+-----------+---------+---------+---------+
| 39        | 0.89897 | 0.89917 | 0.89903 |
+-----------+---------+---------+---------+
| 53        | 2.22650 | 2.22700 | 2.22660 |
+-----------+---------+---------+---------+

sample_id are given in file samples.csv

If you don't know which column to use for normalization, we recommend
using wpunr.

Here are the details of the factors produced:

   * unr: factor computed from data of UNR regions. These regions
     are defined for every pairs of aligned genomes (e.g. BY_RM)

   * wp: same, but for well-positioned nucleosomes.

   * wpunr: both types of regions.

To execute this script, run the following command in your R console:

   source("src/current/get_size_factors.R")

The Script launch_deseq.R
-------------------------

Finally, the script *launch_deseq.R* perform statistical analysis on
each nucleosome using *DESeq*. It produces files:

   * results/current/BY_RM_H3K14ac_wp_snep.tab

   * results/current/BY_RM_H3K14ac_unr_snep.tab

   * results/current/BY_RM_H3K14ac_wpunr_snep.tab

   * results/current/BY_RM_H3K14ac_wp_mnase.tab

   * results/current/BY_RM_H3K14ac_unr_mnase.tab

   * results/current/BY_RM_H3K14ac_wpunr_mnase.tab

These files are organised with the following columns (see file
*BY_RM_H3K14ac_wp_snep.tab* for an example):

   * chr_BY, the number of the chromosome for BY

   * lower_bound_BY, the lower bound of the nucleosome for BY

   * upper_bound_BY, the upper bound of the nucleosome  for BY

   * index_nuc_BY, the index of the BY nucleosome in the CUR for BY

   * chr_RM, the number of the chromosome for RM

   * lower_bound_RM, the lower bound of the nucleosome for RM

   * upper_bound_RM, the upper bound of the nucleosome  for RM

   * index_nuc_RM,the index of the RM nucleosome in the CUR for RM

   * cur_index, index of the CUR

   * form

   * BY_Mnase_Seq_1, the number of reads for the current nucleosome
     for the sample 1

Next columns concern indicators for each sample:

   * BY_Mnase_Seq_2, #reads for sample 2

   * BY_Mnase_Seq_3, #reads for sample 3

   * RM_Mnase_Seq_4, #reads for sample 4

   * RM_Mnase_Seq_5, #reads for sample 5

   * RM_Mnase_Seq_6, #reads for sample 6

   * BY_H3K14ac_36, #reads for sample 36

   * BY_H3K14ac_37, #reads for sample 37

   * BY_H3K14ac_53, #reads for sample 53

   * RM_H3K14ac_38, #reads for sample 38

   * RM_H3K14ac_39, #reads for sample 39

The 5 last columns concern DESeq analysis:

   * manip[a_manip] strain[a_strain]
     manip[a_strain]:strain[a_strain], the manip (marker) effect, the
     strain effect and the snep effect. These are the coefficients of
     the fitted generalized linear model.

   * pvalsGLM, the pvalue resulting from the comparison of the GLM
     model considering the interaction term *marker:strain* to the GLM
     model that does not consider it. This is the statsitcial
     significance of the interaction term and therefore the
     statistical significance of the SNEP.

   * snep_index, a boolean set to TRUE if the pvalueGLM value is
     under the threshold computed with FDR function with a rate set to
     0.0001.

To execute this script, run the following command in your R console:

   source("src/current/launch_deseq.R")

Results: Number of SNEPs
========================

Here are the number of computed SNEPs for each forms.

+-------+---------+-------+---------+
| form  | strains | #nucs | H3K14ac |
+=======+=========+=======+=========+
| wp    | BY-RM   | 30464 | 3549    |
+-------+---------+-------+---------+
| unr   | BY-RM   | 9497  | 1559    |
+-------+---------+-------+---------+
| wpunr | BY-RM   | 39961 | 5240    |
+-------+---------+-------+---------+

APPENDICE: Generate .c2c Files
==============================

The *.c2c* files is a simple table that describes how two genome
sequences are aligned. This file can be generated by using scripts
that were developed in NucleoMiner 1.0 (Nagarajan et al. PLoS Genetics
2010) and which we provide in this release of NucleoMiner2.

To use these scripts on your UNIX/LINUX computer you need first to
install MUMmer which is designed to rapidly align entire genomes,
whether in complete or draft form.

Installing MUMmer
-----------------

Get the last version of MUMmer archive on your computer
(MUMmer3.23.tar.gz is provided in the directory deps of your working
directory). Copy it in a dedicated directory. Install it locally into
the src folder of you working directory by typing (working directory):

tar -xvzf MUMmer3.23.tar.gz

   cd src
   tar xfvz ../deps/MUMmer3.23.tar.gz
   cd MUMmer3.23
   make check
   make install

Installing NucleoMiner 1.0 scripts
----------------------------------

Get the nucleominer-1.0.tar.gz archive on your computer (this archive
is provided in the directory deps of your working directory). Install
it locally into the src folder of you working directory by typing
(working directory):

   cd src
   tar xfvz ../deps/nucleominer-1.0.tar.gz
   cd ..

This creates a directory that contains  NucleoMiner 1.0 scripts
(src/nucleominer-1.0/scripts).

Generate .c2c Files
-------------------

To generate .c2c files you need to type the following command in a
terminal:

   export PATH=$PATH:src/MUMmer3.23:src/nucleominer-1.0/scripts
   export PERL5LIB=$PERL5LIB:src/nucleominer-1.0/scripts/
   NMgxcomp data/saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta \
     data/saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta \
     data/byxrm 2>NMgxcomp.log

After execution, the directory *data* will hold the .c2c files.

Membres

Manager: Florent Chuffart, Florent Chuffart, Stéphane Janczarski
Developer: Florent Chuffart
Reporter: Florent Chuffart

LBMC » NucleoMiner

Aperçu

Membres