Statistiques
| Branche: | Révision :

root / doc / sphinx_doc / tuto.rst @ 5badc2fd

Historique | Voir | Annoter | Télécharger (25,57 ko)

1 935a568c Florent Chuffart
Tutorial
2 935a568c Florent Chuffart
========
3 935a568c Florent Chuffart
4 e5603c3f Florent Chuffart
This tutorial describes steps allowing performing quantitative analysis of epigenetic marks on individual nucleosomes. We assume that files are organised according to a given hierarchy and that all command lines are launched from the project’s root directory.
5 935a568c Florent Chuffart
6 e5603c3f Florent Chuffart
This tutorial is divided into two main parts. The first part covers the python script `wf.py` that aligns and converts short sequence reads. The second part covers the R scripts that extracts information (nucleosome position and indicators) from the dataset.
7 935a568c Florent Chuffart
8 935a568c Florent Chuffart
9 dadb6a4d Florent Chuffart
10 dadb6a4d Florent Chuffart
11 e5603c3f Florent Chuffart
Experimental Dataset, Working Directory and Configuration File 
12 e5603c3f Florent Chuffart
--------------------------------------------------------------
13 dadb6a4d Florent Chuffart
14 e5603c3f Florent Chuffart
Working Directory Organisation
15 e5603c3f Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16 dadb6a4d Florent Chuffart
17 5badc2fd Florent Chuffart
After having install NucleoMiner2 environment (Previous section), go to the root working directory of the tutorial by typing the following command in a terminal:
18 dadb6a4d Florent Chuffart
19 5badc2fd Florent Chuffart
.. code:: bash
20 dadb6a4d Florent Chuffart
21 5badc2fd Florent Chuffart
  cd doc/Chuffart_NM2_workdir/
22 dadb6a4d Florent Chuffart
23 dadb6a4d Florent Chuffart
24 e5603c3f Florent Chuffart
Retrieving Experimental Dataset
25 e5603c3f Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
26 935a568c Florent Chuffart
27 e5603c3f Florent Chuffart
The MNase-seq and MN-ChIP-seq raw data are available at ArrayExpress (http://www.ebi.ac.uk/arrayexpress/) under accession number E-MTAB-2671.
28 935a568c Florent Chuffart
29 e5603c3f Florent Chuffart
$$$ TODO explain how organise Experimental Dataset into the `data` directory of the working directory.
30 935a568c Florent Chuffart
31 935a568c Florent Chuffart
32 e5603c3f Florent Chuffart
We want to compare nucleosomes of 2 yeast strains: BY and RM. For each strain we performed Mnase-Seq and ChIP-Seq using an antibody recognizing the H3K14ac epigenetic mark.
33 935a568c Florent Chuffart
34 e5603c3f Florent Chuffart
The dataset is composed of 55 files organised as follows: 
35 935a568c Florent Chuffart
36 e5603c3f Florent Chuffart
  - 3 replicates for BY MNase Seq
37 e5603c3f Florent Chuffart
  
38 e5603c3f Florent Chuffart
    - sample 1 (5 fastq.gz files)
39 e5603c3f Florent Chuffart
    - sample 2 (5 fastq.gz files)
40 e5603c3f Florent Chuffart
    - sample 3 (4 fastq.gz files)
41 e5603c3f Florent Chuffart
    
42 e5603c3f Florent Chuffart
  - 3 replicates for RM MNase Seq
43 e5603c3f Florent Chuffart
  
44 e5603c3f Florent Chuffart
    - sample 4 (4 fastq.gz files)
45 e5603c3f Florent Chuffart
    - sample 5 (4 fastq.gz files)
46 e5603c3f Florent Chuffart
    - sample 6 (5 fastq.gz files)
47 e5603c3f Florent Chuffart
    
48 e5603c3f Florent Chuffart
  - 3 replicates for BY ChIP Seq H3K14ac
49 e5603c3f Florent Chuffart
  
50 e5603c3f Florent Chuffart
    - sample 36 (5 fastq.gz files)
51 e5603c3f Florent Chuffart
    - sample 37 (5 fastq.gz files)
52 e5603c3f Florent Chuffart
    - sample 53 (9 fastq.gz files)
53 e5603c3f Florent Chuffart
    
54 e5603c3f Florent Chuffart
  - 2 replicates for RM ChIP Seq H3K14ac
55 e5603c3f Florent Chuffart
  
56 e5603c3f Florent Chuffart
    - sample 38 (5 fastq.gz files)
57 e5603c3f Florent Chuffart
    - sample 39 (4 fastq.gz files)
58 e5603c3f Florent Chuffart
    
59 935a568c Florent Chuffart
60 935a568c Florent Chuffart
61 935a568c Florent Chuffart
62 e5603c3f Florent Chuffart
Python and R Common Configuration File
63 e5603c3f Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
64 935a568c Florent Chuffart
65 e5603c3f Florent Chuffart
First of all we define in one place some configuration variables that will be launched by python and R scripts. These variables are contained in file `configurator.py`. The execution of this python script dumps variables into the `nucleominer_config.json` file that will then be used by both R and python scripts.
66 935a568c Florent Chuffart
67 e5603c3f Florent Chuffart
To do this, go to the root directory of your project and run the following command:
68 935a568c Florent Chuffart
69 e5603c3f Florent Chuffart
.. code:: bash
70 935a568c Florent Chuffart
71 e5603c3f Florent Chuffart
  python src/current/configurator.py
72 e5603c3f Florent Chuffart
  
73 935a568c Florent Chuffart
74 935a568c Florent Chuffart
75 935a568c Florent Chuffart
76 935a568c Florent Chuffart
77 935a568c Florent Chuffart
78 935a568c Florent Chuffart
79 935a568c Florent Chuffart
Preprocessing Illumina Fastq Reads for Each Sample
80 935a568c Florent Chuffart
--------------------------------------------------
81 935a568c Florent Chuffart
82 e5603c3f Florent Chuffart
This preprocessing step consists of 4 main steps embedded in the `wf.py` script. They are described bellow. As a preamble, this script computes `samples`, `samples_mnase` and `strains` that will be used along the 4 steps.
83 e5603c3f Florent Chuffart
84 935a568c Florent Chuffart
85 935a568c Florent Chuffart
.. autodata:: wf.samples
86 935a568c Florent Chuffart
    :noindex: 
87 935a568c Florent Chuffart
88 935a568c Florent Chuffart
.. autodata:: wf.samples_mnase
89 935a568c Florent Chuffart
    :noindex: 
90 935a568c Florent Chuffart
91 935a568c Florent Chuffart
.. autodata:: wf.strains
92 935a568c Florent Chuffart
    :noindex: 
93 935a568c Florent Chuffart
94 935a568c Florent Chuffart
95 935a568c Florent Chuffart
Creating Bowtie Index from each Reference Genome
96 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
97 935a568c Florent Chuffart
98 e5603c3f Florent Chuffart
For each strain, we need to create bowtie index. Bowtie index of a strain is a tree view of the genome of this strain. It will be used by bowtie to align reads. This step is performed by the following part of the `wf.py` script:
99 935a568c Florent Chuffart
100 8e9facd8 Florent Chuffart
.. literalinclude:: ../../../snep/src/current/wf.py
101 935a568c Florent Chuffart
   :start-after: # _STARTOF_ step_1
102 935a568c Florent Chuffart
   :end-before: # _ENDOF_ step_1
103 935a568c Florent Chuffart
   :language: python
104 935a568c Florent Chuffart
105 e5603c3f Florent Chuffart
The following table summarizes the file sizes and process durations concerning this step.
106 935a568c Florent Chuffart
107 935a568c Florent Chuffart
======  ======================  ======================  ================
108 935a568c Florent Chuffart
strain  fasta genome file size  bowtie index file size  process duration
109 935a568c Florent Chuffart
======  ======================  ======================  ================
110 935a568c Florent Chuffart
BY      12 Mo                          25 Mo                    11 s.
111 935a568c Florent Chuffart
RM      12 Mo                          24 Mo                    9 s.
112 935a568c Florent Chuffart
======  ======================  ======================  ================
113 935a568c Florent Chuffart
114 935a568c Florent Chuffart
115 935a568c Florent Chuffart
116 935a568c Florent Chuffart
117 935a568c Florent Chuffart
Aligning Reads to Reference Genome 
118 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
119 935a568c Florent Chuffart
120 935a568c Florent Chuffart
Next, we launch bowtie to align reads to the reference genome. It produces a 
121 e5603c3f Florent Chuffart
`.sam` file that we convert into a `.bed` file. Binaries for `bowtie`, `samtools` and `bedtools` are wrapped using python `subprocess` class. This step is performed by the following part of the `wf.py` script:
122 935a568c Florent Chuffart
123 8e9facd8 Florent Chuffart
.. literalinclude:: ../../../snep/src/current/wf.py
124 935a568c Florent Chuffart
   :start-after: # _STARTOF_ step_2
125 935a568c Florent Chuffart
   :end-before: # _ENDOF_ step_2
126 935a568c Florent Chuffart
   :language: python
127 935a568c Florent Chuffart
128 e5603c3f Florent Chuffart
Convert Aligned Reads into TemplateFilter Format
129 e5603c3f Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
130 e5603c3f Florent Chuffart
131 e5603c3f Florent Chuffart
TemplateFilter uses particular input formats for reads, so it is necessary to convert the `.bed` files. TemplateFilter expect reads as follows: `chr`, `coord`, `strand` and `#read` where:
132 935a568c Florent Chuffart
133 e5603c3f Florent Chuffart
- `chr` is the number of the chromosome;
134 e5603c3f Florent Chuffart
- `coord` is the coordinate of the reads;
135 e5603c3f Florent Chuffart
- `strand` is `F` for forward and `R` for reverse;
136 e5603c3f Florent Chuffart
- `#reads` the number of reads covering this position.
137 935a568c Florent Chuffart
138 935a568c Florent Chuffart
Each entry is *tab*-separated.
139 935a568c Florent Chuffart
140 e5603c3f Florent Chuffart
**WARNING** for reverse strands, bowtie returns the position of the first nucleotide on the left hand side, whereas TemplateFilter expects the first one on the right hand side.  This step takes this into account by adding the read length (in our case 50) to the reverse reads coordinates.
141 935a568c Florent Chuffart
142 e5603c3f Florent Chuffart
This step is performed by the following part of the `wf.py` script:
143 935a568c Florent Chuffart
144 8e9facd8 Florent Chuffart
.. literalinclude:: ../../../snep/src/current/wf.py
145 935a568c Florent Chuffart
   :start-after: # _STARTOF_ step_3
146 935a568c Florent Chuffart
   :end-before: # _ENDOF_ step_3
147 935a568c Florent Chuffart
   :language: python
148 935a568c Florent Chuffart
149 e5603c3f Florent Chuffart
The following table summarises the number of reads, the involved file sizes and process durations concerning the two last steps. In our case, alignment process have been multithreaded over 3 cores.
150 935a568c Florent Chuffart
151 935a568c Florent Chuffart
==  ==============  =========================  ======  ================  ==================  ================  
152 935a568c Florent Chuffart
id  Illumina reads  aligned and filtred reads  ratio   `.bed` file size  TF input file size  process duration
153 935a568c Florent Chuffart
==  ==============  =========================  ======  ================  ==================  ================
154 935a568c Florent Chuffart
1   16436138        10199695                   62,06%  1064 Mo           60  Mo              383   s.
155 935a568c Florent Chuffart
2   16911132        12512727                   73,99%  1298 Mo           64  Mo              437   s.
156 935a568c Florent Chuffart
3   15946902        12340426                   77,38%  1280 Mo           65  Mo              423   s.
157 935a568c Florent Chuffart
4   13765584        10381903                   75,42%  931  Mo           59  Mo              352   s.
158 935a568c Florent Chuffart
5   15168268        11502855                   75,83%  1031 Mo           64  Mo              386   s.
159 935a568c Florent Chuffart
6   18850820        14024905                   74,40%  1254 Mo           69  Mo              482   s.
160 935a568c Florent Chuffart
36  17715118        14092985                   79,55%  1404 Mo           68  Mo              483   s.
161 935a568c Florent Chuffart
37  17288466        7402082                    42,82%  741  Mo           48  Mo              339   s.
162 935a568c Florent Chuffart
38  16116394        13178457                   81,77%  1101 Mo           63  Mo              420   s.
163 935a568c Florent Chuffart
39  14241106        10537228                   73,99%  880  Mo           57  Mo              348   s.
164 935a568c Florent Chuffart
53  40876476        33780065                   82,64%  3316 Mo           103 Mo              1165  s.
165 935a568c Florent Chuffart
==  ==============  =========================  ======  ================  ==================  ================  
166 935a568c Florent Chuffart
167 935a568c Florent Chuffart
Run TemplateFilter on Mnase Samples
168 dadb6a4d Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
169 935a568c Florent Chuffart
170 e5603c3f Florent Chuffart
Finally, for each sample we perform TemplateFilter analysis. 
171 935a568c Florent Chuffart
172 935a568c Florent Chuffart
**WARNING** TemplateFilter returns a list of nucleosomes. Each nucleosome is 
173 e5603c3f Florent Chuffart
define by its center and its width. An odd width leads us to consider non-
174 e5603c3f Florent Chuffart
integer lower and upper bound.
175 935a568c Florent Chuffart
176 e5603c3f Florent Chuffart
**WARNING** TemplateFilter is not designed to deal with replicates. So we recommend to keep a maximum of nucleosomes and filter the aberrant ones afterwards using the benefits of having replicates. To do this, we set a low correlation threshold parameter (0.5) and a particularly high value of overlap (300%).
177 935a568c Florent Chuffart
178 e5603c3f Florent Chuffart
This step is performed by the following part of the `wf.py` script:
179 935a568c Florent Chuffart
180 8e9facd8 Florent Chuffart
.. literalinclude:: ../../../snep/src/current/wf.py
181 935a568c Florent Chuffart
   :start-after: # _STARTOF_ step_4
182 935a568c Florent Chuffart
   :end-before: # _ENDOF_ step_4
183 935a568c Florent Chuffart
   :language: python
184 935a568c Florent Chuffart
185 935a568c Florent Chuffart
==  ======  ==========  =============  ================
186 935a568c Florent Chuffart
id  strain  found nucs  nuc file size  process duration
187 935a568c Florent Chuffart
==  ======  ==========  =============  ================
188 935a568c Florent Chuffart
1    BY     96214       68 Mo          1022 s.                     
189 935a568c Florent Chuffart
2    BY     91694       65 Mo          1038 s.                      
190 935a568c Florent Chuffart
3    BY     91205       65 Mo          1036 s.                       
191 935a568c Florent Chuffart
4    RM     88076       62 Mo          984 s.                      
192 935a568c Florent Chuffart
5    RM     90141       64 Mo          967 s.                      
193 935a568c Florent Chuffart
6    RM     87517       62 Mo          980 s.                      
194 935a568c Florent Chuffart
==  ======  ==========  =============  ================
195 935a568c Florent Chuffart
196 935a568c Florent Chuffart
197 935a568c Florent Chuffart
198 935a568c Florent Chuffart
199 935a568c Florent Chuffart
200 935a568c Florent Chuffart
201 935a568c Florent Chuffart
202 935a568c Florent Chuffart
203 935a568c Florent Chuffart
204 935a568c Florent Chuffart
205 935a568c Florent Chuffart
206 935a568c Florent Chuffart
207 935a568c Florent Chuffart
208 e5603c3f Florent Chuffart
209 e5603c3f Florent Chuffart
210 e5603c3f Florent Chuffart
211 e5603c3f Florent Chuffart
212 e5603c3f Florent Chuffart
213 e5603c3f Florent Chuffart
214 e5603c3f Florent Chuffart
215 e5603c3f Florent Chuffart
216 e5603c3f Florent Chuffart
217 e5603c3f Florent Chuffart
218 e5603c3f Florent Chuffart
219 e5603c3f Florent Chuffart
220 e5603c3f Florent Chuffart
221 e5603c3f Florent Chuffart
222 e5603c3f Florent Chuffart
223 e5603c3f Florent Chuffart
224 e5603c3f Florent Chuffart
225 e5603c3f Florent Chuffart
226 e5603c3f Florent Chuffart
227 e5603c3f Florent Chuffart
228 e5603c3f Florent Chuffart
229 e5603c3f Florent Chuffart
230 e5603c3f Florent Chuffart
231 e5603c3f Florent Chuffart
232 e5603c3f Florent Chuffart
233 e5603c3f Florent Chuffart
234 e5603c3f Florent Chuffart
235 e5603c3f Florent Chuffart
236 e5603c3f Florent Chuffart
237 e5603c3f Florent Chuffart
238 e5603c3f Florent Chuffart
239 e5603c3f Florent Chuffart
240 e5603c3f Florent Chuffart
241 e5603c3f Florent Chuffart
242 e5603c3f Florent Chuffart
243 e5603c3f Florent Chuffart
244 e5603c3f Florent Chuffart
245 e5603c3f Florent Chuffart
246 e5603c3f Florent Chuffart
247 e5603c3f Florent Chuffart
248 e5603c3f Florent Chuffart
249 e5603c3f Florent Chuffart
..
250 e5603c3f Florent Chuffart
..
251 e5603c3f Florent Chuffart
.. - libcoverage.py
252 e5603c3f Florent Chuffart
.. - wf.py
253 e5603c3f Florent Chuffart
..
254 e5603c3f Florent Chuffart
..
255 e5603c3f Florent Chuffart
..
256 e5603c3f Florent Chuffart
..
257 e5603c3f Florent Chuffart
..
258 e5603c3f Florent Chuffart
..
259 e5603c3f Florent Chuffart
.. In order to simplify the design of experiment, we consider Mnase as a marker.
260 e5603c3f Florent Chuffart
.. For each couple `(strain, marker)` we perform 3 replicates. So, theoritically
261 e5603c3f Florent Chuffart
.. we should have `3 * (1 + 5) * 3 = 54` samples. In practice we only obtain 2
262 e5603c3f Florent Chuffart
.. replicates for `(YJM, H3K4me1)`. Each one of the 53 samples is indentify by a
263 e5603c3f Florent Chuffart
.. uniq identifier. The file `CSV_SAMPLE_FILE` sums up this information.
264 e5603c3f Florent Chuffart
..
265 e5603c3f Florent Chuffart
.. .. autodata:: configurator.CSV_SAMPLE_FILE
266 e5603c3f Florent Chuffart
..     :noindex:
267 e5603c3f Florent Chuffart
..
268 e5603c3f Florent Chuffart
.. We use a convention to link sample and Illumina fastq outputs. Illumina output
269 e5603c3f Florent Chuffart
.. files of the sample `ID` will be stored in the directory
270 e5603c3f Florent Chuffart
.. `ILLUMINA_OUTPUTFILE_PREFIX` + `ID`. For example, sample 41 outputs will be
271 e5603c3f Florent Chuffart
.. stored in the directory `data/2012-09-05/FASTQ/Sample_Yvert_Bq41/`.
272 e5603c3f Florent Chuffart
..
273 e5603c3f Florent Chuffart
.. .. autodata:: configurator.ILLUMINA_OUTPUTFILE_PREFIX
274 e5603c3f Florent Chuffart
..     :noindex:
275 e5603c3f Florent Chuffart
..
276 e5603c3f Florent Chuffart
.. For BY (resp. RM and YJM) we use following reference genome
277 e5603c3f Florent Chuffart
.. `saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta`
278 e5603c3f Florent Chuffart
.. (resp. `saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta` and
279 e5603c3f Florent Chuffart
.. `saccharomyces_cerevisiae_YJM_789_screencontig.fasta`).
280 e5603c3f Florent Chuffart
.. The index `FASTA_REFERENCE_GENOME_FILES` stores this information.
281 e5603c3f Florent Chuffart
..
282 e5603c3f Florent Chuffart
.. .. autodata:: configurator.FASTA_REFERENCE_GENOME_FILES
283 e5603c3f Florent Chuffart
..     :noindex:
284 e5603c3f Florent Chuffart
..
285 e5603c3f Florent Chuffart
.. Each chromosome/contig is identify in the fasta file by an obscure identifier.
286 e5603c3f Florent Chuffart
.. For example, BY chromosome I is identify by `gi|144228165|ref|NC_001133.7|` when
287 e5603c3f Florent Chuffart
.. TemplateFilter is waiting for an integer. So, we translate it. The index
288 e5603c3f Florent Chuffart
.. `FASTA_INDEXES` stores this translation.
289 e5603c3f Florent Chuffart
..
290 e5603c3f Florent Chuffart
.. .. autodata:: configurator.FASTA_INDEXES
291 e5603c3f Florent Chuffart
..     :noindex:
292 e5603c3f Florent Chuffart
..
293 e5603c3f Florent Chuffart
.. From a pragamatical point of view we discard some part of the genome (repeated
294 e5603c3f Florent Chuffart
.. sequence etc...). The list of the black listed area is explicitely detailled in
295 e5603c3f Florent Chuffart
.. `AREA_BLACK_LIST`.
296 e5603c3f Florent Chuffart
..
297 e5603c3f Florent Chuffart
.. .. autodata:: configurator.AREA_BLACK_LIST
298 e5603c3f Florent Chuffart
..     :noindex:
299 e5603c3f Florent Chuffart
..
300 e5603c3f Florent Chuffart
.. For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use previously
301 e5603c3f Florent Chuffart
.. compute .c2c file `data/2012-03_primarydata/BY_RM_gxcomp.c2c` (resp.
302 e5603c3f Florent Chuffart
.. `BY_YJM_GComp_All.c2c` and `RM_YJM_gxcomp.c2c`). For more information about
303 e5603c3f Florent Chuffart
.. .c2c files, please read section 5 of the manual of `NucleoMiner`, the old
304 e5603c3f Florent Chuffart
.. version of `NucleoMiner2`
305 e5603c3f Florent Chuffart
.. (http://www.ens-lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf).
306 e5603c3f Florent Chuffart
..
307 e5603c3f Florent Chuffart
.. .. autodata:: configurator.C2C_FILES
308 e5603c3f Florent Chuffart
..     :noindex:
309 e5603c3f Florent Chuffart
..
310 e5603c3f Florent Chuffart
.. `nucleominer` uses specific directory to work in, these are described in
311 e5603c3f Florent Chuffart
.. `INDEX_DIR`, `ALIGN_DIR` and `LOG_DIR`.
312 e5603c3f Florent Chuffart
..
313 e5603c3f Florent Chuffart
.. Finally, `nucleominer` use external ressources, the path to these resspources
314 e5603c3f Florent Chuffart
.. are describe in `BOWTIE_BUILD_BIN`, `BOWTIE2_BIN`, `SAMTOOLS_BIN`,
315 e5603c3f Florent Chuffart
.. `BEDTOOLS_BIN` and `TF_BIN` and `TF_TEMPLATES_FILE`.
316 e5603c3f Florent Chuffart
..
317 e5603c3f Florent Chuffart
.. All paths, prefixes and indexes could be change in the
318 e5603c3f Florent Chuffart
.. `src/current/nucleominer_config.json` file.
319 e5603c3f Florent Chuffart
..
320 e5603c3f Florent Chuffart
.. .. autodata:: wf.json_conf_file
321 e5603c3f Florent Chuffart
..     :noindex:
322 e5603c3f Florent Chuffart
..
323 e5603c3f Florent Chuffart
324 e5603c3f Florent Chuffart
325 e5603c3f Florent Chuffart
326 e5603c3f Florent Chuffart
327 e5603c3f Florent Chuffart
328 e5603c3f Florent Chuffart
329 e5603c3f Florent Chuffart
330 e5603c3f Florent Chuffart
331 e5603c3f Florent Chuffart
332 e5603c3f Florent Chuffart
333 e5603c3f Florent Chuffart
334 e5603c3f Florent Chuffart
335 e5603c3f Florent Chuffart
336 e5603c3f Florent Chuffart
337 e5603c3f Florent Chuffart
338 e5603c3f Florent Chuffart
339 e5603c3f Florent Chuffart
340 e5603c3f Florent Chuffart
341 e5603c3f Florent Chuffart
342 e5603c3f Florent Chuffart
343 e5603c3f Florent Chuffart
344 e5603c3f Florent Chuffart
345 e5603c3f Florent Chuffart
346 e5603c3f Florent Chuffart
347 e5603c3f Florent Chuffart
348 e5603c3f Florent Chuffart
349 e5603c3f Florent Chuffart
350 e5603c3f Florent Chuffart
351 e5603c3f Florent Chuffart
352 e5603c3f Florent Chuffart
353 e5603c3f Florent Chuffart
354 e5603c3f Florent Chuffart
355 e5603c3f Florent Chuffart
356 e5603c3f Florent Chuffart
357 e5603c3f Florent Chuffart
358 e5603c3f Florent Chuffart
359 e5603c3f Florent Chuffart
360 e5603c3f Florent Chuffart
361 e5603c3f Florent Chuffart
362 e5603c3f Florent Chuffart
363 e5603c3f Florent Chuffart
364 e5603c3f Florent Chuffart
365 935a568c Florent Chuffart
Inferring Nucleosome Position and Extracting Read Counts
366 935a568c Florent Chuffart
--------------------------------------------------------
367 935a568c Florent Chuffart
368 935a568c Florent Chuffart
369 935a568c Florent Chuffart
370 e5603c3f Florent Chuffart
The second part of the tutorial uses R (http://http://www.r-project.org). It consists of a set of R scripts that will be sourced in an R from a console launched at the root of your project. These scripts are:
371 935a568c Florent Chuffart
372 dadb6a4d Florent Chuffart
  - headers.R
373 935a568c Florent Chuffart
  - extract_maps.R
374 e5603c3f Florent Chuffart
  - translate_common_wp.R
375 b20637ed Florent Chuffart
  - split_samples.R
376 935a568c Florent Chuffart
  - count_reads.R
377 935a568c Florent Chuffart
  - get_size_factors  
378 935a568c Florent Chuffart
  - launch_deseq.R
379 935a568c Florent Chuffart
380 dadb6a4d Florent Chuffart
The Script headers.R
381 dadb6a4d Florent Chuffart
^^^^^^^^^^^^^^^^^^^^
382 dadb6a4d Florent Chuffart
383 e5603c3f Florent Chuffart
The script headers.R is included in each other scripts. It is in charge of: 
384 dadb6a4d Florent Chuffart
385 e5603c3f Florent Chuffart
  - launching libraries used in the scripts
386 dadb6a4d Florent Chuffart
  - launching configuration (design, strain, marker...)
387 e5603c3f Florent Chuffart
  - computing and caching CURs (caching means storing the information in the computer's memory)
388 e5603c3f Florent Chuffart
389 e5603c3f Florent Chuffart
Note that you can customize the function “translate”. This function allows you to use the alignments between genomes when performing various tasks. You may be using NucleoMiner2 to analyse data of a single strain, or of several strains. 
390 e5603c3f Florent Chuffart
391 e5603c3f Florent Chuffart
  - All the data corresponds to the same strain (e.g. treatment/control, or only few mutations): Then in step 1), the  regions to use are entire chromosomes. Instep 2) simply use the default translate function which is neutral.
392 e5603c3f Florent Chuffart
393 e5603c3f Florent Chuffart
  - The data come from two or more strains: In this case, edit a list of regions and customize the translate function which performs the correspondence between the different genomes. How we did it: a .c2c file is obtained with NucleoMiner 1.0 (refer to the Appendice "Generate .c2c Files"), then use it to produce the list of regions and customise “translate”.
394 e5603c3f Florent Chuffart
395 e5603c3f Florent Chuffart
396 e5603c3f Florent Chuffart
397 dadb6a4d Florent Chuffart
398 dadb6a4d Florent Chuffart
In your R console, run the following command line:
399 935a568c Florent Chuffart
400 935a568c Florent Chuffart
.. code:: bash
401 935a568c Florent Chuffart
402 e5603c3f Florent Chuffart
  source("src/current/headers.R")
403 935a568c Florent Chuffart
404 935a568c Florent Chuffart
405 dadb6a4d Florent Chuffart
The Script extract_maps.R
406 dadb6a4d Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^
407 e5603c3f Florent Chuffart
This script is in charge of extracting Maps for well-positioned and fuzzy nucleosomes. First of all, this script computes intra and inter-strain nucleosome maps for each CUR. This step is executed in parallel on many cores using the BoT library. Next, it collects results and produces well-positioned, fuzzy and UNR maps.
408 dadb6a4d Florent Chuffart
409 e5603c3f Florent Chuffart
The well-positioned map for BY is collected in the result directory and is called `BY_wp.tab`. It is composed of following columns:
410 dadb6a4d Florent Chuffart
411 dadb6a4d Florent Chuffart
 - chr, the number of the chromosome 
412 dadb6a4d Florent Chuffart
 - lower_bound, the lower bound of the nucleosome
413 dadb6a4d Florent Chuffart
 - upper_bound, the upper bound of the nucleosome 
414 dadb6a4d Florent Chuffart
 - cur_index, index of the CUR
415 dadb6a4d Florent Chuffart
 - index_nuc, the index of the nucleosome in the CUR
416 e5603c3f Florent Chuffart
 - wp, 1 if it is a well positioned nucleosome, 0 otherwise
417 e5603c3f Florent Chuffart
 - nb_reads, the number of reads that support this nucleosome
418 e5603c3f Florent Chuffart
 - nb_nucs, the number of TemplateFilter nucleosome across replicates (= the number of replicates in which it is a well-positioned nucleosome)
419 e5603c3f Florent Chuffart
 - llr_1, for a well-positioned nucleosome, it is the LLR1 (log-likelihood ratio) between the first and the second TemplateFilter nucleosome on the chain.
420 e5603c3f Florent Chuffart
 - llr_2, for a well-positioned nucleosome, it is the LLR1 between the second and the third TemplateFilter nucleosome on the chain.
421 e5603c3f Florent Chuffart
 - wp_llr, for a well-positioned nucleosome, it is the LLR2 that compares consistency of the positioning over all TemplateFilter nucleosomes.
422 e5603c3f Florent Chuffart
 - wp_pval, for a well-positioned nucleosome, it is the p-value chi square test obtained with the LLR2 (`1-pchisq(2.LLR2, df=4)`)
423 e5603c3f Florent Chuffart
 - dyad_shift, for a well-positioned nucleosome, it is the shift between the two extreme TemplateFilter nucleosome dyad positions. 
424 dadb6a4d Florent Chuffart
425 e5603c3f Florent Chuffart
The fuzzy map for BY is collected in the result directory and is called `BY_fuzzy.tab`. It is composed of following columns:
426 dadb6a4d Florent Chuffart
427 dadb6a4d Florent Chuffart
 - chr, the number of the chromosome 
428 dadb6a4d Florent Chuffart
 - lower_bound, the lower bound of the nucleosome
429 dadb6a4d Florent Chuffart
 - upper_bound, the upper bound of the nucleosome 
430 dadb6a4d Florent Chuffart
 - cur_index, index of the CUR
431 dadb6a4d Florent Chuffart
432 e5603c3f Florent Chuffart
The map of common well-positioned nucleosomes aligned between the BY and RM strains is collected in the result directory and is called `BY_RM_common_wp.tab`. It is composed of following columns:
433 dadb6a4d Florent Chuffart
434 dadb6a4d Florent Chuffart
 - cur_index, the index of the CUR
435 dadb6a4d Florent Chuffart
 - index_nuc_BY, the index of the BY nucleosome in the CUR
436 e5603c3f Florent Chuffart
 - index_nuc_RM, the index of the RM nucleosome in the CUR
437 e5603c3f Florent Chuffart
 - llr_score, , the LLR3 score that estimates conservation between the positions in BY and RM 
438 e5603c3f Florent Chuffart
 - common_wp_pval,  the p-value chi square test obtained from LLR3 (`1-pchisq(2.LLR3, df=2)`)
439 e5603c3f Florent Chuffart
 - diff, the dyads shift between the positions in the two strains
440 dadb6a4d Florent Chuffart
441 e5603c3f Florent Chuffart
The common UNR map for BY and RM strains is collected in the result directory and is called `BY_RM_common_unr.tab`. It is composed of the following columns:
442 dadb6a4d Florent Chuffart
443 dadb6a4d Florent Chuffart
 - cur_index, the index of the CUR
444 dadb6a4d Florent Chuffart
 - index_nuc_BY, the index of the BY nucleosome in the CUR
445 dadb6a4d Florent Chuffart
 - index_nuc_RM,the index of the RM nucleosome in the CUR
446 dadb6a4d Florent Chuffart
447 e5603c3f Florent Chuffart
To execute this script, run the following command in your R console:
448 935a568c Florent Chuffart
449 935a568c Florent Chuffart
.. code:: bash
450 935a568c Florent Chuffart
451 dadb6a4d Florent Chuffart
  source("src/current/extract_maps.R")
452 dadb6a4d Florent Chuffart
453 dadb6a4d Florent Chuffart
454 e5603c3f Florent Chuffart
The Script translate_common_wp.R
455 e5603c3f Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
456 dadb6a4d Florent Chuffart
457 e5603c3f Florent Chuffart
This script is used to translate common well-positioned nucleosome maps from a strain to another strain and stores it into a table. 
458 dadb6a4d Florent Chuffart
459 e5603c3f Florent Chuffart
For example, the file `results/2014-04/RM_wp_tr_2_BY.tab` contains RM well-positioned nucleosome translated into the BY genome coordinates. It is composed of following columns:
460 dadb6a4d Florent Chuffart
461 dadb6a4d Florent Chuffart
 - strain_ref, the reference genome (in which positioned are defined)
462 dadb6a4d Florent Chuffart
 - begin, the translated lower bound of the nucleosome
463 dadb6a4d Florent Chuffart
 - end, the translated upper bound of the nucleosome
464 e5603c3f Florent Chuffart
 - chr, the number of chromosomes for the reference genome (in which positioned are defined)
465 dadb6a4d Florent Chuffart
 - length, the length of the nucleosome (could be negative)
466 dadb6a4d Florent Chuffart
 - cur_index, the index of the CUR
467 dadb6a4d Florent Chuffart
 - index_nuc, the index of the nucleosome in the CUR
468 dadb6a4d Florent Chuffart
469 e5603c3f Florent Chuffart
To execute this script, run the following command in your R console:
470 935a568c Florent Chuffart
471 e5603c3f Florent Chuffart
.. code:: bash
472 935a568c Florent Chuffart
473 e5603c3f Florent Chuffart
  source("src/current/translate_common_wp.R")
474 b20637ed Florent Chuffart
475 b20637ed Florent Chuffart
476 e5603c3f Florent Chuffart
The Script split_samples.R
477 e5603c3f Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^
478 b20637ed Florent Chuffart
479 e5603c3f Florent Chuffart
For memory space usage reasons, we split and compress TemplateFilter input files according to their corresponding  chromosome. for example, `sample_1_TF.tab` will be split into :
480 b20637ed Florent Chuffart
481 e5603c3f Florent Chuffart
  - sample_1_chr_1_splited_sample.tab.gz
482 e5603c3f Florent Chuffart
  - sample_1_chr_2_splited_sample.tab.gz
483 e5603c3f Florent Chuffart
  - ...
484 e5603c3f Florent Chuffart
  - sample_1_chr_17_splited_sample.tab.gz
485 e5603c3f Florent Chuffart
  
486 e5603c3f Florent Chuffart
487 e5603c3f Florent Chuffart
To execute this script, run the following command in your R console:
488 b20637ed Florent Chuffart
489 b20637ed Florent Chuffart
.. code:: bash
490 b20637ed Florent Chuffart
491 e5603c3f Florent Chuffart
  source("src/current/split_samples.R")
492 b20637ed Florent Chuffart
493 b20637ed Florent Chuffart
494 e5603c3f Florent Chuffart
The Script count_reads.R
495 e5603c3f Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^
496 e5603c3f Florent Chuffart
497 e5603c3f Florent Chuffart
To associate a number of observations (read) to each nucleosome we run the script `count_reads.R`. It produces the files `BY_RM_H3K14ac_wp_and_nbreads.tab`, `BY_RM_H3K14ac_unr_and_nbreads.tab` `BY_RM_Mnase_Seq_wp_and_nbreads.tab` and `BY_RM_Mnase_Seq_unr_and_nbreads.tab`  
498 e5603c3f Florent Chuffart
for H3K14ac common well-positioned nucleosomes, H3K14ac UNRs, Mnase common well-positioned nucleosomes and Mnase UNRs respectively. 
499 e5603c3f Florent Chuffart
500 e5603c3f Florent Chuffart
For example, the file `BY_RM_H3K14ac_unr_and_nbreads.tab` contains counted reads for well-positioned nucleosomes with the experimental condition ChIP H3K14ac. It is composed of the following columns:
501 e5603c3f Florent Chuffart
502 e5603c3f Florent Chuffart
  - chr_BY, the number of the chromosome for BY
503 e5603c3f Florent Chuffart
  - lower_bound_BY, the lower bound of the nucleosome for BY
504 e5603c3f Florent Chuffart
  - upper_bound_BY, the upper bound of the nucleosome  for BY
505 e5603c3f Florent Chuffart
  - index_nuc_BY, the index of the BY nucleosome in the CUR for BY
506 e5603c3f Florent Chuffart
  - chr_RM, the number of the chromosome for RM
507 e5603c3f Florent Chuffart
  - lower_bound_RM, the lower bound of the nucleosome for RM
508 e5603c3f Florent Chuffart
  - upper_bound_RM, the upper bound of the nucleosome  for RM
509 e5603c3f Florent Chuffart
  - index_nuc_RM,the index of the RM nucleosome in the CUR for RM
510 e5603c3f Florent Chuffart
  - cur_index, index of the CUR
511 e5603c3f Florent Chuffart
  - BY_H3K14ac_36, the number of reads for the current nucleosome for the sample 36
512 e5603c3f Florent Chuffart
  - BY_H3K14ac_37, #reads for sample 37
513 e5603c3f Florent Chuffart
  - BY_H3K14ac_53, #reads for sample 53
514 e5603c3f Florent Chuffart
  - RM_H3K14ac_38, #reads for sample 38
515 e5603c3f Florent Chuffart
  - RM_H3K14ac_39, #reads for sample 39
516 e5603c3f Florent Chuffart
517 e5603c3f Florent Chuffart
To execute this script, run the following command in your R console:
518 935a568c Florent Chuffart
519 935a568c Florent Chuffart
.. code:: bash
520 935a568c Florent Chuffart
521 e5603c3f Florent Chuffart
  source("src/current/count_reads.R")
522 e5603c3f Florent Chuffart
523 e5603c3f Florent Chuffart
524 e5603c3f Florent Chuffart
The Script get_size_factors.R
525 e5603c3f Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
526 e5603c3f Florent Chuffart
527 e5603c3f Florent Chuffart
528 e5603c3f Florent Chuffart
This script uses the DESeq function `estimateSizeFactors` to compute the size factor of each sample. It corresponds to normalisation of read counts from sample to sample, as determined by DESeq. When a sample has n reads for a nucleosome or a UNR,
529 e5603c3f Florent Chuffart
the normalised count is n/f where f is the factor contained in this file.
530 e5603c3f Florent Chuffart
The script dumps computed size factors into the file `size_factors.tab`. This file has the form:
531 e5603c3f Florent Chuffart
532 e5603c3f Florent Chuffart
========= ======= ======= =======
533 e5603c3f Florent Chuffart
sample_id      wp     unr   wpunr
534 e5603c3f Florent Chuffart
========= ======= ======= =======
535 e5603c3f Florent Chuffart
        1 0.87396 0.88097 0.87584
536 e5603c3f Florent Chuffart
        2 1.07890 1.07440 1.07760
537 e5603c3f Florent Chuffart
        3 1.06400 1.05890 1.06250
538 e5603c3f Florent Chuffart
        4 0.85782 0.87948 0.86305
539 e5603c3f Florent Chuffart
        5 0.97577 0.96590 0.97307
540 e5603c3f Florent Chuffart
        6 1.19630 1.18120 1.19190
541 e5603c3f Florent Chuffart
       36 0.93318 0.92762 0.93166
542 e5603c3f Florent Chuffart
       37 0.48315 0.48453 0.48350
543 e5603c3f Florent Chuffart
       38 1.11240 1.11210 1.11230
544 e5603c3f Florent Chuffart
       39 0.89897 0.89917 0.89903
545 e5603c3f Florent Chuffart
       53 2.22650 2.22700 2.22660
546 e5603c3f Florent Chuffart
========= ======= ======= =======
547 e5603c3f Florent Chuffart
548 e5603c3f Florent Chuffart
sample_id are given in file samples.csv
549 935a568c Florent Chuffart
550 e5603c3f Florent Chuffart
If you don't know which column to use, we recommend using wpunr.
551 935a568c Florent Chuffart
552 e5603c3f Florent Chuffart
If you want the very detailed factors produced by DESeq, here are the information:
553 e5603c3f Florent Chuffart
554 e5603c3f Florent Chuffart
  - unr: factor computed from data of UNR regions. These regions are defined for every pairs of aligned genomes (e.g. BY_RM)
555 e5603c3f Florent Chuffart
  - wp: same, but for well-positioned nucleosomes.
556 e5603c3f Florent Chuffart
  - wpunr: both types of regions.
557 e5603c3f Florent Chuffart
558 e5603c3f Florent Chuffart
To execute this script, run the following command in your R console:
559 935a568c Florent Chuffart
560 935a568c Florent Chuffart
.. code:: bash
561 935a568c Florent Chuffart
562 e5603c3f Florent Chuffart
  source("src/current/get_size_factors.R")
563 935a568c Florent Chuffart
564 935a568c Florent Chuffart
565 e5603c3f Florent Chuffart
The Script launch_deseq.R
566 935a568c Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^
567 935a568c Florent Chuffart
568 e5603c3f Florent Chuffart
Finally, the script `launch_deseq.R` perform statistical analysis on each nucleosome using `DESeq`. It produces files:
569 e5603c3f Florent Chuffart
 
570 e5603c3f Florent Chuffart
  - results/current/BY_RM_H3K14ac_wp_snep.tab
571 e5603c3f Florent Chuffart
  - results/current/BY_RM_H3K14ac_unr_snep.tab
572 e5603c3f Florent Chuffart
  - results/current/BY_RM_H3K14ac_wpunr_snep.tab
573 e5603c3f Florent Chuffart
  - results/current/BY_RM_H3K14ac_wp_mnase.tab
574 e5603c3f Florent Chuffart
  - results/current/BY_RM_H3K14ac_unr_mnase.tab
575 e5603c3f Florent Chuffart
  - results/current/BY_RM_H3K14ac_wpunr_mnase.tab
576 e5603c3f Florent Chuffart
577 e5603c3f Florent Chuffart
These files are organised with the following columns (see file `BY_RM_H3K14ac_wp_snep.tab` for an example):
578 e5603c3f Florent Chuffart
579 e5603c3f Florent Chuffart
  - chr_BY, the number of the chromosome for BY
580 e5603c3f Florent Chuffart
  - lower_bound_BY, the lower bound of the nucleosome for BY
581 e5603c3f Florent Chuffart
  - upper_bound_BY, the upper bound of the nucleosome  for BY
582 e5603c3f Florent Chuffart
  - index_nuc_BY, the index of the BY nucleosome in the CUR for BY
583 e5603c3f Florent Chuffart
  - chr_RM, the number of the chromosome for RM
584 e5603c3f Florent Chuffart
  - lower_bound_RM, the lower bound of the nucleosome for RM
585 e5603c3f Florent Chuffart
  - upper_bound_RM, the upper bound of the nucleosome  for RM
586 e5603c3f Florent Chuffart
  - index_nuc_RM,the index of the RM nucleosome in the CUR for RM
587 e5603c3f Florent Chuffart
  - cur_index, index of the CUR
588 e5603c3f Florent Chuffart
  - form 
589 e5603c3f Florent Chuffart
  - BY_Mnase_Seq_1, the number of reads for the current nucleosome for the sample 1  
590 e5603c3f Florent Chuffart
591 e5603c3f Florent Chuffart
Next columns concern indicators for each sample:
592 e5603c3f Florent Chuffart
593 e5603c3f Florent Chuffart
  - BY_Mnase_Seq_2, #reads for sample 2
594 e5603c3f Florent Chuffart
  - BY_Mnase_Seq_3, #reads for sample 3  
595 e5603c3f Florent Chuffart
  - RM_Mnase_Seq_4, #reads for sample 4  
596 e5603c3f Florent Chuffart
  - RM_Mnase_Seq_5, #reads for sample 5
597 e5603c3f Florent Chuffart
  - RM_Mnase_Seq_6, #reads for sample 6
598 e5603c3f Florent Chuffart
  - BY_H3K14ac_36, #reads for sample 36
599 e5603c3f Florent Chuffart
  - BY_H3K14ac_37, #reads for sample 37
600 e5603c3f Florent Chuffart
  - BY_H3K14ac_53, #reads for sample 53
601 e5603c3f Florent Chuffart
  - RM_H3K14ac_38, #reads for sample 38
602 e5603c3f Florent Chuffart
  - RM_H3K14ac_39, #reads for sample 39
603 e5603c3f Florent Chuffart
  
604 e5603c3f Florent Chuffart
The 5 last columns concern DESeq analysis:
605 e5603c3f Florent Chuffart
606 e5603c3f Florent Chuffart
  - manip[a_manip] strain[a_strain] manip[a_strain]:strain[a_strain], the manip (marker) effect, the strain effect and the snep effect. These are the coefficients of the fitted generalized linear model.
607 e5603c3f Florent Chuffart
  - pvalsGLM, the pvalue resulting of the comparison of the GLM model considering or not the interaction term marker:strain. This is the statsitcial significance of the interaction term and therefore the statistical significance of the SNEP.
608 e5603c3f Florent Chuffart
  - snep_index, a boolean set to TRUE if the pvalueGLM value is under the threshold computed with FDR function with a rate set to 0.0001.
609 e5603c3f Florent Chuffart
To execute this script, run the following command 
610 e5603c3f Florent Chuffart
611 e5603c3f Florent Chuffart
To execute this script, run the following command in your R console:
612 e5603c3f Florent Chuffart
613 935a568c Florent Chuffart
.. code:: bash
614 935a568c Florent Chuffart
615 e5603c3f Florent Chuffart
  source("src/current/launch_deseq.R")
616 935a568c Florent Chuffart
617 935a568c Florent Chuffart
618 e5603c3f Florent Chuffart
Results: Number of SNEPs
619 e5603c3f Florent Chuffart
------------------------
620 935a568c Florent Chuffart
621 e5603c3f Florent Chuffart
Here are the number of computed SNEPs for each forms.
622 935a568c Florent Chuffart
623 e5603c3f Florent Chuffart
===== ======= ===== =======
624 e5603c3f Florent Chuffart
 form strains #nucs H3K14ac
625 e5603c3f Florent Chuffart
===== ======= ===== =======
626 e5603c3f Florent Chuffart
   wp   BY-RM 30464    3549
627 e5603c3f Florent Chuffart
  unr   BY-RM  9497    1559
628 e5603c3f Florent Chuffart
wpunr   BY-RM 39961    5240
629 e5603c3f Florent Chuffart
===== ======= ===== =======
630 e5603c3f Florent Chuffart
  
631 935a568c Florent Chuffart
632 935a568c Florent Chuffart
633 935a568c Florent Chuffart
634 935a568c Florent Chuffart
635 935a568c Florent Chuffart
636 e5603c3f Florent Chuffart
APPENDICE: Generate .c2c Files
637 e5603c3f Florent Chuffart
------------------------------
638 935a568c Florent Chuffart
639 5badc2fd Florent Chuffart
$$$ TODO make it works properly.
640 5badc2fd Florent Chuffart
working directory.
641 5badc2fd Florent Chuffart
642 5badc2fd Florent Chuffart
643 e5603c3f Florent Chuffart
The `.c2c` files is a simple table that describes how the genome sequence can be aligned. We generate it using NucleoMiner 1.0.
644 935a568c Florent Chuffart
645 e5603c3f Florent Chuffart
To install NucleoMiner 1.0 on your UNIX/LINUX computer you need first to install the Genetic Data analysis Library (GDL), which is a dynamic library of useful C functions derived from the GNU Scientific Library.
646 935a568c Florent Chuffart
647 e5603c3f Florent Chuffart
Installing the GDL library
648 e5603c3f Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^
649 935a568c Florent Chuffart
650 e5603c3f Florent Chuffart
Get the gdl-1.0.tar.gz archive on your computer (in the directory deps of your working directory). Copy it in a dedicated directory. Go into this directory using the cd command, and then unfold the archive by typing:
651 935a568c Florent Chuffart
652 e5603c3f Florent Chuffart
tar -xvzf gdl-1.0.tar.gz
653 935a568c Florent Chuffart
654 e5603c3f Florent Chuffart
This creates a directory called gdl-1.0. You now need to go into this directory and compile the library, by typing:
655 935a568c Florent Chuffart
656 935a568c Florent Chuffart
.. code:: bash
657 935a568c Florent Chuffart
658 5badc2fd Florent Chuffart
  mkdir tmp_c2c_workdir
659 5badc2fd Florent Chuffart
  cd tmp_c2c_workdir
660 5badc2fd Florent Chuffart
  cp ../deps/gdl-1.0.tar.gz .
661 5badc2fd Florent Chuffart
  tar -xvzf gdl-1.0.tar.gz
662 e5603c3f Florent Chuffart
  cd gdl-1.0
663 e5603c3f Florent Chuffart
  ./configure
664 e5603c3f Florent Chuffart
  make
665 5badc2fd Florent Chuffart
  
666 5badc2fd Florent Chuffart
  cd ..
667 5badc2fd Florent Chuffart
  
668 935a568c Florent Chuffart
669 e5603c3f Florent Chuffart
Now you need to install the library on your system. This needs root priviledges:
670 935a568c Florent Chuffart
671 e5603c3f Florent Chuffart
.. code:: bash
672 e5603c3f Florent Chuffart
673 e5603c3f Florent Chuffart
  sudo make install
674 e5603c3f Florent Chuffart
675 5badc2fd Florent Chuffart
Installing NucleoMiner 1.0
676 5badc2fd Florent Chuffart
^^^^^^^^^^^^^^^^^^^^^^^^^^
677 e5603c3f Florent Chuffart
678 e5603c3f Florent Chuffart
Get the nucleominer-1.0.tar.gz archive on your computer. Copy it in a dedicated directory. Go into this directory using the cd command, and then unfold the archive by typing:
679 e5603c3f Florent Chuffart
680 e5603c3f Florent Chuffart
This creates a directory called nucleominer-1.0. You now need to go into this directory and compile the library, by typing:
681 935a568c Florent Chuffart
682 e5603c3f Florent Chuffart
.. code:: bash
683 935a568c Florent Chuffart
684 5badc2fd Florent Chuffart
  cp ../deps/nucleominer-1.0.tar.gz .
685 5badc2fd Florent Chuffart
  tar -xvzf nucleominer-1.0.tar.gz
686 e5603c3f Florent Chuffart
  cd nucleominer-1.0
687 5badc2fd Florent Chuffart
  ln -s ../gdl-1.0/gdl
688 e5603c3f Florent Chuffart
  ./configure
689 e5603c3f Florent Chuffart
  make
690 935a568c Florent Chuffart
691 e5603c3f Florent Chuffart
You can then use the binaries dircetly from this folder (best then is to add the path to this folder in your PATH environment variable). If you want to install nucleominer at the system's level (useful if mutiple users will need it) then type, with root priviledges:
692 935a568c Florent Chuffart
693 935a568c Florent Chuffart
.. code:: bash
694 935a568c Florent Chuffart
695 e5603c3f Florent Chuffart
  sudo make install
696 e5603c3f Florent Chuffart
697 e5603c3f Florent Chuffart
Generate .c2c Files
698 e5603c3f Florent Chuffart
^^^^^^^^^^^^^^^^^^^
699 e5603c3f Florent Chuffart
700 e5603c3f Florent Chuffart
To generate .c2c files you need to type the following command in a terminal:
701 e5603c3f Florent Chuffart
702 e5603c3f Florent Chuffart
.. code:: bash
703 935a568c Florent Chuffart
704 e5603c3f Florent Chuffart
  mkdir dir_4_c2c
705 5badc2fd Florent Chuffart
  NMgxcomp ../data/saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta\
706 5badc2fd Florent Chuffart
           ../data/saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta\
707 e5603c3f Florent Chuffart
           dir_4_c2c/BY_RM 2>dir_4_c2c/BY_RM.log
708 e5603c3f Florent Chuffart
            
709 5badc2fd Florent Chuffart
After execution, the directory `dir_4_c2c` will hold the .c2c files.