root / README @ master
Historique | Voir | Annoter | Télécharger (30,72 ko)
1 | 935a568c | Florent Chuffart | |
---|---|---|---|
2 | 3c88abd0 | Florent Chuffart | Welcome to *NucleoMiner2* |
3 | 3c88abd0 | Florent Chuffart | ************************* |
4 | 3c88abd0 | Florent Chuffart | |
5 | 3c88abd0 | Florent Chuffart | * Readme / Documentation for *NucleoMiner2* |
6 | 3c88abd0 | Florent Chuffart | |
7 | 3c88abd0 | Florent Chuffart | * License |
8 | 3c88abd0 | Florent Chuffart | |
9 | 3c88abd0 | Florent Chuffart | * Installation Instructions |
10 | 3c88abd0 | Florent Chuffart | |
11 | 3c88abd0 | Florent Chuffart | * Tutorial |
12 | 3c88abd0 | Florent Chuffart | |
13 | 3c88abd0 | Florent Chuffart | * Experimental Dataset, Working Directory and Configuration File |
14 | 3c88abd0 | Florent Chuffart | |
15 | 3c88abd0 | Florent Chuffart | * Preprocessing Illumina Fastq Reads for Each Sample |
16 | 3c88abd0 | Florent Chuffart | |
17 | 3c88abd0 | Florent Chuffart | * Inferring Nucleosome Position and Extracting Read Counts |
18 | 3c88abd0 | Florent Chuffart | |
19 | 3c88abd0 | Florent Chuffart | * Results: Number of SNEPs |
20 | 3c88abd0 | Florent Chuffart | |
21 | 3c88abd0 | Florent Chuffart | * APPENDICE: Generate .c2c Files |
22 | 3c88abd0 | Florent Chuffart | |
23 | 3c88abd0 | Florent Chuffart | * References |
24 | 3c88abd0 | Florent Chuffart | |
25 | 3c88abd0 | Florent Chuffart | * Python Reference |
26 | 3c88abd0 | Florent Chuffart | |
27 | 3c88abd0 | Florent Chuffart | * R Reference |
28 | 3c88abd0 | Florent Chuffart | |
29 | 935a568c | Florent Chuffart | Readme / Documentation for *NucleoMiner2* |
30 | 935a568c | Florent Chuffart | ***************************************** |
31 | 935a568c | Florent Chuffart | |
32 | 935a568c | Florent Chuffart | *NucleoMiner2* offers Python API and R package allowing to perform |
33 | e5603c3f | Florent Chuffart | quantitative analysis of epigenetic marks on individual nucleosomes. |
34 | e5603c3f | Florent Chuffart | It was developed to detect natural Single-Nucleosome Epi-Polymorphisms |
35 | e5603c3f | Florent Chuffart | (SNEP) from MNase-seq and ChIP-seq data. |
36 | 935a568c | Florent Chuffart | |
37 | 935a568c | Florent Chuffart | |
38 | 935a568c | Florent Chuffart | License |
39 | 935a568c | Florent Chuffart | ======= |
40 | 935a568c | Florent Chuffart | |
41 | 935a568c | Florent Chuffart | Copyright CNRS 2012-2013 |
42 | 935a568c | Florent Chuffart | |
43 | 935a568c | Florent Chuffart | * Florent CHUFFART |
44 | 935a568c | Florent Chuffart | |
45 | 935a568c | Florent Chuffart | * Jean-Baptiste VEYRIERAS |
46 | 935a568c | Florent Chuffart | |
47 | 935a568c | Florent Chuffart | * Gael YVERT |
48 | 935a568c | Florent Chuffart | |
49 | 935a568c | Florent Chuffart | This software is a computer program which purpose is to perform |
50 | 935a568c | Florent Chuffart | quanti- tative analysis of epigenetic marks at single nucleosome |
51 | 935a568c | Florent Chuffart | resolution. |
52 | 935a568c | Florent Chuffart | |
53 | 935a568c | Florent Chuffart | This software is governed by the CeCILL license under French law and |
54 | 935a568c | Florent Chuffart | abiding by the rules of distribution of free software. You can use, |
55 | 935a568c | Florent Chuffart | modify and/ or redistribute the software under the terms of the CeCILL |
56 | 935a568c | Florent Chuffart | license as circulated by CEA, CNRS and INRIA at the following URL |
57 | 935a568c | Florent Chuffart | "http://www.cecill.info". |
58 | 935a568c | Florent Chuffart | |
59 | 935a568c | Florent Chuffart | As a counterpart to the access to the source code and rights to copy, |
60 | 935a568c | Florent Chuffart | modify and redistribute granted by the license, users are provided |
61 | 935a568c | Florent Chuffart | only with a limited warranty and the software's author, the holder |
62 | 935a568c | Florent Chuffart | of the economic rights, and the successive licensors have only |
63 | 935a568c | Florent Chuffart | limited liability. |
64 | 935a568c | Florent Chuffart | |
65 | e5603c3f | Florent Chuffart | This software is provided with absolutely NO WARRANTY. The authors can |
66 | e5603c3f | Florent Chuffart | not be held responsible, even partially, for any damage, loss, |
67 | e5603c3f | Florent Chuffart | financial loss or any other undesired facts resulting from the use of |
68 | e5603c3f | Florent Chuffart | the software. In this respect, the user's attention is drawn to the |
69 | e5603c3f | Florent Chuffart | risks associated with loading, using, modifying and/or developing or |
70 | e5603c3f | Florent Chuffart | reproducing the software by the user in light of its specific status |
71 | e5603c3f | Florent Chuffart | of free software, that may mean that it is complicated to manipulate, |
72 | e5603c3f | Florent Chuffart | and that also therefore means that it is reserved for developers |
73 | e5603c3f | Florent Chuffart | and experienced professionals having in-depth computer knowledge. |
74 | e5603c3f | Florent Chuffart | Users are therefore encouraged to load and test the software's |
75 | e5603c3f | Florent Chuffart | suitability as regards their requirements in conditions enabling the |
76 | e5603c3f | Florent Chuffart | security of their systems and/or data to be ensured and, more |
77 | e5603c3f | Florent Chuffart | generally, to use and operate it in the same conditions as regards |
78 | e5603c3f | Florent Chuffart | security. |
79 | 935a568c | Florent Chuffart | |
80 | 935a568c | Florent Chuffart | The fact that you are presently reading this means that you have had |
81 | 935a568c | Florent Chuffart | knowledge of the CeCILL license and that you accept its terms. |
82 | 935a568c | Florent Chuffart | |
83 | 935a568c | Florent Chuffart | |
84 | 935a568c | Florent Chuffart | Installation Instructions |
85 | 935a568c | Florent Chuffart | ========================= |
86 | 935a568c | Florent Chuffart | |
87 | 935a568c | Florent Chuffart | |
88 | 935a568c | Florent Chuffart | Links |
89 | 935a568c | Florent Chuffart | ----- |
90 | 935a568c | Florent Chuffart | |
91 | e5603c3f | Florent Chuffart | *NucleoMiner2* home page and documentation are available here: |
92 | 935a568c | Florent Chuffart | |
93 | e5603c3f | Florent Chuffart | * https://forge.cbp.ens-lyon.fr/redmine/projects/nucleominer |
94 | e5603c3f | Florent Chuffart | |
95 | e5603c3f | Florent Chuffart | The Yvert lab web page is accessible here: |
96 | e5603c3f | Florent Chuffart | |
97 | e5603c3f | Florent Chuffart | * http://www.ens-lyon.fr/LBMC/gisv/ |
98 | 935a568c | Florent Chuffart | |
99 | 935a568c | Florent Chuffart | |
100 | 935a568c | Florent Chuffart | Installation |
101 | 935a568c | Florent Chuffart | ------------ |
102 | 935a568c | Florent Chuffart | |
103 | 5badc2fd | Florent Chuffart | The first installation step is to retrieve the source code of |
104 | 3961deb6 | Florent Chuffart | NucleoMiner2. You can do this by typing the following command in a |
105 | 5badc2fd | Florent Chuffart | terminal. |
106 | 5badc2fd | Florent Chuffart | |
107 | 5badc2fd | Florent Chuffart | git clone http://forge.cbp.ens-lyon.fr/git/nucleominer |
108 | 5badc2fd | Florent Chuffart | |
109 | 935a568c | Florent Chuffart | |
110 | e5603c3f | Florent Chuffart | Prerequisites |
111 | e5603c3f | Florent Chuffart | ~~~~~~~~~~~~~ |
112 | e5603c3f | Florent Chuffart | |
113 | e5603c3f | Florent Chuffart | To work properly, NucleoMiner2 needs that the following free software |
114 | e5603c3f | Florent Chuffart | are installed and made available on your system: |
115 | e5603c3f | Florent Chuffart | |
116 | e5603c3f | Florent Chuffart | * Bowtie2 http://bowtie-bio.sourceforge.net/bowtie2 |
117 | 935a568c | Florent Chuffart | |
118 | e5603c3f | Florent Chuffart | * SAMtools http://samtools.sourceforge.net |
119 | 935a568c | Florent Chuffart | |
120 | e5603c3f | Florent Chuffart | * bedtools http://code.google.com/p/bedtools/ |
121 | 935a568c | Florent Chuffart | |
122 | e5603c3f | Florent Chuffart | * TemplateFilter |
123 | e5603c3f | Florent Chuffart | http://compbio.cs.huji.ac.il/NucPosition/TemplateFiltering |
124 | 935a568c | Florent Chuffart | |
125 | e5603c3f | Florent Chuffart | It also requires the following R packages to be installed on your |
126 | e5603c3f | Florent Chuffart | system: |
127 | dadb6a4d | Florent Chuffart | |
128 | dadb6a4d | Florent Chuffart | * fork |
129 | 935a568c | Florent Chuffart | |
130 | 935a568c | Florent Chuffart | * rjson |
131 | 935a568c | Florent Chuffart | |
132 | 935a568c | Florent Chuffart | * seqinr |
133 | 935a568c | Florent Chuffart | |
134 | 5badc2fd | Florent Chuffart | * plotrix |
135 | 5badc2fd | Florent Chuffart | |
136 | 5badc2fd | Florent Chuffart | * DESeq |
137 | 5badc2fd | Florent Chuffart | |
138 | 3961deb6 | Florent Chuffart | These packages can be installed by typing the following command in an |
139 | 3961deb6 | Florent Chuffart | R console: |
140 | 3961deb6 | Florent Chuffart | |
141 | 3961deb6 | Florent Chuffart | install.packages(c("fork", "rjson", "seqinr", "plotrix")) |
142 | 3961deb6 | Florent Chuffart | source("http://bioconductor.org/biocLite.R") |
143 | 3961deb6 | Florent Chuffart | biocLite("DESeq") |
144 | 3961deb6 | Florent Chuffart | |
145 | 3961deb6 | Florent Chuffart | Finally,by typing the git command above, you downloaded specific R |
146 | 3961deb6 | Florent Chuffart | packages provided with NucleoMiner2 that you now need to install: |
147 | 3961deb6 | Florent Chuffart | |
148 | e5603c3f | Florent Chuffart | * cachecache https://forge.cbp.ens- |
149 | e5603c3f | Florent Chuffart | lyon.fr/redmine/projects/cachecache |
150 | 935a568c | Florent Chuffart | |
151 | e5603c3f | Florent Chuffart | * bot https://forge.cbp.ens-lyon.fr/redmine/projects/bot |
152 | 935a568c | Florent Chuffart | |
153 | e5603c3f | Florent Chuffart | * nucleominer https://forge.cbp.ens- |
154 | e5603c3f | Florent Chuffart | lyon.fr/redmine/projects/nucleominer |
155 | 5badc2fd | Florent Chuffart | |
156 | 3961deb6 | Florent Chuffart | To do so, type the following command in your terminal: |
157 | 5badc2fd | Florent Chuffart | |
158 | 5badc2fd | Florent Chuffart | cd nucleominer |
159 | 5badc2fd | Florent Chuffart | R CMD INSTALL doc/Chuffart_NM2_workdir/deps/bot_0.14.tar.gz\ |
160 | 5badc2fd | Florent Chuffart | doc/Chuffart_NM2_workdir/deps/cachecache_0.1.tar.gz\ |
161 | 5badc2fd | Florent Chuffart | build/nucleominer_2.3.46.tar.gz |
162 | 3c88abd0 | Florent Chuffart | |
163 | 3c88abd0 | Florent Chuffart | Tutorial |
164 | 3c88abd0 | Florent Chuffart | ******** |
165 | 3c88abd0 | Florent Chuffart | |
166 | 3961deb6 | Florent Chuffart | This tutorial describes steps allowing to perform quantitative |
167 | 3c88abd0 | Florent Chuffart | analysis of epigenetic marks on individual nucleosomes. We assume that |
168 | 3c88abd0 | Florent Chuffart | files are organised according to a given hierarchy and that all |
169 | 3c88abd0 | Florent Chuffart | command lines are launched from the project’s root directory. |
170 | 3c88abd0 | Florent Chuffart | |
171 | 3c88abd0 | Florent Chuffart | This tutorial is divided into two main parts. The first part covers |
172 | 3c88abd0 | Florent Chuffart | the python script *wf.py* that aligns and converts short sequence |
173 | 3961deb6 | Florent Chuffart | reads. The second part covers the R scripts that extracts nucleosome- |
174 | 3961deb6 | Florent Chuffart | level information (nucleosome position and indicators) from the |
175 | 3961deb6 | Florent Chuffart | dataset. |
176 | 3c88abd0 | Florent Chuffart | |
177 | 3c88abd0 | Florent Chuffart | |
178 | 3c88abd0 | Florent Chuffart | Experimental Dataset, Working Directory and Configuration File |
179 | 3c88abd0 | Florent Chuffart | ============================================================== |
180 | 3c88abd0 | Florent Chuffart | |
181 | 3c88abd0 | Florent Chuffart | |
182 | 3c88abd0 | Florent Chuffart | Working Directory Organisation |
183 | 3c88abd0 | Florent Chuffart | ------------------------------ |
184 | 3c88abd0 | Florent Chuffart | |
185 | 3961deb6 | Florent Chuffart | After having installed NucleoMiner2 environment (Previous section), go |
186 | 3c88abd0 | Florent Chuffart | to the root working directory of the tutorial by typing the following |
187 | 3c88abd0 | Florent Chuffart | command in a terminal: |
188 | 3c88abd0 | Florent Chuffart | |
189 | 3c88abd0 | Florent Chuffart | cd doc/Chuffart_NM2_workdir/ |
190 | 3c88abd0 | Florent Chuffart | |
191 | 3c88abd0 | Florent Chuffart | |
192 | 3c88abd0 | Florent Chuffart | Retrieving Experimental Dataset |
193 | 3c88abd0 | Florent Chuffart | ------------------------------- |
194 | 3c88abd0 | Florent Chuffart | |
195 | 3c88abd0 | Florent Chuffart | The MNase-seq and MN-ChIP-seq raw data are available at ArrayExpress |
196 | 3c88abd0 | Florent Chuffart | (http://www.ebi.ac.uk/arrayexpress/) under accession number |
197 | 3c88abd0 | Florent Chuffart | E-MTAB-2671. |
198 | 3c88abd0 | Florent Chuffart | |
199 | 3c88abd0 | Florent Chuffart | $$$ TODO explain how organise Experimental Dataset into the *data* |
200 | 3c88abd0 | Florent Chuffart | directory of the working directory. |
201 | 3c88abd0 | Florent Chuffart | |
202 | 3961deb6 | Florent Chuffart | In this tutorial, we want to compare nucleosomes of 2 yeast strains: |
203 | 3961deb6 | Florent Chuffart | BY and RM. For each strain Mnase-Seq was performed as well as ChIP-Seq |
204 | 3961deb6 | Florent Chuffart | using an antibody recognizing the H3K14ac epigenetic mark. Illumina |
205 | 3961deb6 | Florent Chuffart | sequencing was done in single-read of 50 bp long. |
206 | 3c88abd0 | Florent Chuffart | |
207 | 3c88abd0 | Florent Chuffart | The dataset is composed of 55 files organised as follows: |
208 | 3c88abd0 | Florent Chuffart | |
209 | 3c88abd0 | Florent Chuffart | * 3 replicates for BY MNase Seq |
210 | 3c88abd0 | Florent Chuffart | |
211 | 3c88abd0 | Florent Chuffart | * sample 1 (5 fastq.gz files) |
212 | 3c88abd0 | Florent Chuffart | |
213 | 3c88abd0 | Florent Chuffart | * sample 2 (5 fastq.gz files) |
214 | 3c88abd0 | Florent Chuffart | |
215 | 3c88abd0 | Florent Chuffart | * sample 3 (4 fastq.gz files) |
216 | 3c88abd0 | Florent Chuffart | |
217 | 3c88abd0 | Florent Chuffart | * 3 replicates for RM MNase Seq |
218 | 3c88abd0 | Florent Chuffart | |
219 | 3c88abd0 | Florent Chuffart | * sample 4 (4 fastq.gz files) |
220 | 3c88abd0 | Florent Chuffart | |
221 | 3c88abd0 | Florent Chuffart | * sample 5 (4 fastq.gz files) |
222 | 3c88abd0 | Florent Chuffart | |
223 | 3c88abd0 | Florent Chuffart | * sample 6 (5 fastq.gz files) |
224 | 3c88abd0 | Florent Chuffart | |
225 | 3c88abd0 | Florent Chuffart | * 3 replicates for BY ChIP Seq H3K14ac |
226 | 3c88abd0 | Florent Chuffart | |
227 | 3c88abd0 | Florent Chuffart | * sample 36 (5 fastq.gz files) |
228 | 3c88abd0 | Florent Chuffart | |
229 | 3c88abd0 | Florent Chuffart | * sample 37 (5 fastq.gz files) |
230 | 3c88abd0 | Florent Chuffart | |
231 | 3c88abd0 | Florent Chuffart | * sample 53 (9 fastq.gz files) |
232 | 3c88abd0 | Florent Chuffart | |
233 | 3c88abd0 | Florent Chuffart | * 2 replicates for RM ChIP Seq H3K14ac |
234 | 3c88abd0 | Florent Chuffart | |
235 | 3c88abd0 | Florent Chuffart | * sample 38 (5 fastq.gz files) |
236 | 3c88abd0 | Florent Chuffart | |
237 | 3c88abd0 | Florent Chuffart | * sample 39 (4 fastq.gz files) |
238 | 3c88abd0 | Florent Chuffart | |
239 | 3c88abd0 | Florent Chuffart | |
240 | 3c88abd0 | Florent Chuffart | Python and R Common Configuration File |
241 | 3c88abd0 | Florent Chuffart | -------------------------------------- |
242 | 3c88abd0 | Florent Chuffart | |
243 | 3961deb6 | Florent Chuffart | First, we need to define useful configuration variables that will be |
244 | 3961deb6 | Florent Chuffart | passed to python and R scripts. These variables are contained in file |
245 | 3961deb6 | Florent Chuffart | *configurator.py*. The execution of this python script dumps variables |
246 | 3961deb6 | Florent Chuffart | into the *nucleominer_config.json* file that will then be used by both |
247 | 3961deb6 | Florent Chuffart | R and python scripts. |
248 | 3c88abd0 | Florent Chuffart | |
249 | 3961deb6 | Florent Chuffart | The initialization of this variables is done in the configurator.py |
250 | 3961deb6 | Florent Chuffart | file. If you need to adapt variable values (path, default |
251 | 3961deb6 | Florent Chuffart | parameters...) you need to edit this file. Then, go to the root |
252 | 3961deb6 | Florent Chuffart | directory of your project and run the following command to dump the |
253 | 3961deb6 | Florent Chuffart | configuration file: |
254 | 3c88abd0 | Florent Chuffart | |
255 | 3c88abd0 | Florent Chuffart | python src/current/configurator.py |
256 | 3c88abd0 | Florent Chuffart | |
257 | 3c88abd0 | Florent Chuffart | |
258 | 3c88abd0 | Florent Chuffart | Preprocessing Illumina Fastq Reads for Each Sample |
259 | 3c88abd0 | Florent Chuffart | ================================================== |
260 | 3c88abd0 | Florent Chuffart | |
261 | 3961deb6 | Florent Chuffart | Once variables and design have been specified, the script wf.py will |
262 | 3961deb6 | Florent Chuffart | automatically run all the analysis. You don't need to do anything. To |
263 | 3961deb6 | Florent Chuffart | run the full analysis, run the following command: |
264 | 3961deb6 | Florent Chuffart | |
265 | 3961deb6 | Florent Chuffart | python src/current/wf.py |
266 | 3961deb6 | Florent Chuffart | |
267 | 3961deb6 | Florent Chuffart | The details of the steps performed by this script are explained below. |
268 | 3961deb6 | Florent Chuffart | This preprocessing consists of 4 steps embedded in the *wf.py* script. |
269 | 3961deb6 | Florent Chuffart | They are described bellow. As a preamble, this script computes |
270 | 3961deb6 | Florent Chuffart | *samples*, *samples_mnase* and *strains* that will be used along the 4 |
271 | 3961deb6 | Florent Chuffart | steps. |
272 | 3c88abd0 | Florent Chuffart | |
273 | 3c88abd0 | Florent Chuffart | wf.samples = [] |
274 | 3c88abd0 | Florent Chuffart | |
275 | 3c88abd0 | Florent Chuffart | List of samples where a sample is identified by an id (key: *id*) |
276 | 3c88abd0 | Florent Chuffart | and a strain name (key *strain*). |
277 | 3c88abd0 | Florent Chuffart | |
278 | 3c88abd0 | Florent Chuffart | wf.samples_mnase = [] |
279 | 3c88abd0 | Florent Chuffart | |
280 | 3c88abd0 | Florent Chuffart | List of Mnase samples. |
281 | 3c88abd0 | Florent Chuffart | |
282 | 3c88abd0 | Florent Chuffart | wf.strains = [] |
283 | 3c88abd0 | Florent Chuffart | |
284 | 3c88abd0 | Florent Chuffart | List of reference strains. |
285 | 3c88abd0 | Florent Chuffart | |
286 | 3c88abd0 | Florent Chuffart | |
287 | 3c88abd0 | Florent Chuffart | Creating Bowtie Index from each Reference Genome |
288 | 3c88abd0 | Florent Chuffart | ------------------------------------------------ |
289 | 3c88abd0 | Florent Chuffart | |
290 | 3961deb6 | Florent Chuffart | For each strain, the script *wf.py* then creates bowtie index. Bowtie |
291 | 3961deb6 | Florent Chuffart | index of a strain is a tree view of the genome of this strain. It will |
292 | 3961deb6 | Florent Chuffart | be used by bowtie to align reads. The part of the script performing |
293 | 3961deb6 | Florent Chuffart | this is the following: |
294 | 3c88abd0 | Florent Chuffart | |
295 | 3c88abd0 | Florent Chuffart | for strain in strains: |
296 | 3c88abd0 | Florent Chuffart | per_strain_stats[strain] = create_bowtie_index(strain, |
297 | 3c88abd0 | Florent Chuffart | config["FASTA_REFERENCE_GENOME_FILES"][strain], config["INDEX_DIR"], |
298 | 3c88abd0 | Florent Chuffart | config["BOWTIE_BUILD_BIN"]) |
299 | 3c88abd0 | Florent Chuffart | |
300 | 3961deb6 | Florent Chuffart | As an indication, the following table summarizes the file sizes and |
301 | 3961deb6 | Florent Chuffart | process durations that we experienced when running this step on a |
302 | 3961deb6 | Florent Chuffart | Linux server***. |
303 | 3c88abd0 | Florent Chuffart | |
304 | 3c88abd0 | Florent Chuffart | +--------+------------------------+------------------------+------------------+ |
305 | 3c88abd0 | Florent Chuffart | | strain | fasta genome file size | bowtie index file size | process duration | |
306 | 3c88abd0 | Florent Chuffart | +========+========================+========================+==================+ |
307 | 3c88abd0 | Florent Chuffart | | BY | 12 Mo | 25 Mo | 11 s. | |
308 | 3c88abd0 | Florent Chuffart | +--------+------------------------+------------------------+------------------+ |
309 | 3c88abd0 | Florent Chuffart | | RM | 12 Mo | 24 Mo | 9 s. | |
310 | 3c88abd0 | Florent Chuffart | +--------+------------------------+------------------------+------------------+ |
311 | 3c88abd0 | Florent Chuffart | |
312 | 3c88abd0 | Florent Chuffart | |
313 | 3c88abd0 | Florent Chuffart | Aligning Reads to Reference Genome |
314 | 3c88abd0 | Florent Chuffart | ---------------------------------- |
315 | 3c88abd0 | Florent Chuffart | |
316 | 3961deb6 | Florent Chuffart | Next, the *wf.py* script launches bowtie to align reads to the |
317 | 3961deb6 | Florent Chuffart | reference genome. It produces a *.sam* file that is converted into a |
318 | 3961deb6 | Florent Chuffart | *.bed* file. Binaries for *bowtie*, *samtools* and *bedtools* are |
319 | 3961deb6 | Florent Chuffart | wrapped using python *subprocess* class. This step is performed by the |
320 | 3961deb6 | Florent Chuffart | following part of the script: |
321 | 3c88abd0 | Florent Chuffart | |
322 | 3c88abd0 | Florent Chuffart | for sample in samples: |
323 | 3c88abd0 | Florent Chuffart | per_sample_align_stats["sample_%s" % sample["id"]] = align_reads(sample, |
324 | 3c88abd0 | Florent Chuffart | config["ALIGN_DIR"], config["LOG_DIR"], config["INDEX_DIR"], |
325 | 3c88abd0 | Florent Chuffart | config["ILLUMINA_OUTPUTFILE_PREFIX"], config["BOWTIE2_BIN"], |
326 | 3c88abd0 | Florent Chuffart | config["SAMTOOLS_BIN"], config["BEDTOOLS_BIN"]) |
327 | 3c88abd0 | Florent Chuffart | |
328 | 3c88abd0 | Florent Chuffart | |
329 | 3c88abd0 | Florent Chuffart | Convert Aligned Reads into TemplateFilter Format |
330 | 3c88abd0 | Florent Chuffart | ------------------------------------------------ |
331 | 3c88abd0 | Florent Chuffart | |
332 | 3c88abd0 | Florent Chuffart | TemplateFilter uses particular input formats for reads, so it is |
333 | 3961deb6 | Florent Chuffart | necessary to convert the *.bed* files. TemplateFilter expect reads in |
334 | 3961deb6 | Florent Chuffart | the following format: *chr*, *coord*, *strand* and *#read* where: |
335 | 3c88abd0 | Florent Chuffart | |
336 | 3c88abd0 | Florent Chuffart | * *chr* is the number of the chromosome; |
337 | 3c88abd0 | Florent Chuffart | |
338 | 3c88abd0 | Florent Chuffart | * *coord* is the coordinate of the reads; |
339 | 3c88abd0 | Florent Chuffart | |
340 | 3c88abd0 | Florent Chuffart | * *strand* is *F* for forward and *R* for reverse; |
341 | 3c88abd0 | Florent Chuffart | |
342 | 3c88abd0 | Florent Chuffart | * *#reads* the number of reads covering this position. |
343 | 3c88abd0 | Florent Chuffart | |
344 | 3c88abd0 | Florent Chuffart | Each entry is *tab*-separated. |
345 | 3c88abd0 | Florent Chuffart | |
346 | 3c88abd0 | Florent Chuffart | **WARNING** for reverse strands, bowtie returns the position of the |
347 | 3c88abd0 | Florent Chuffart | first nucleotide on the left hand side, whereas TemplateFilter expects |
348 | 3961deb6 | Florent Chuffart | the first one on the right hand side. This is taken into account in |
349 | 3961deb6 | Florent Chuffart | NucleoMiner2 by adding the read length (in our case 50) to the reverse |
350 | 3c88abd0 | Florent Chuffart | reads coordinates. |
351 | 3c88abd0 | Florent Chuffart | |
352 | 3c88abd0 | Florent Chuffart | This step is performed by the following part of the *wf.py* script: |
353 | 3c88abd0 | Florent Chuffart | |
354 | 3c88abd0 | Florent Chuffart | for sample in samples: |
355 | 3c88abd0 | Florent Chuffart | per_sample_convert_stats["sample_%s" % sample["id"]] = split_fr_4_TF(sample, |
356 | 3c88abd0 | Florent Chuffart | config["ALIGN_DIR"], config["FASTA_INDEXES"], config["AREA_BLACK_LIST"], |
357 | 3c88abd0 | Florent Chuffart | config["READ_LENGTH"],config["MAPQ_THRES"]) |
358 | 3c88abd0 | Florent Chuffart | |
359 | 3961deb6 | Florent Chuffart | The following table summarizes the number of reads, the involved file |
360 | 3961deb6 | Florent Chuffart | sizes and process durations that we experienced when running the two |
361 | 3961deb6 | Florent Chuffart | last steps. In our case, alignment process were multithreaded over 3 |
362 | 3961deb6 | Florent Chuffart | cores. |
363 | 3c88abd0 | Florent Chuffart | |
364 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
365 | 3c88abd0 | Florent Chuffart | | id | Illumina reads | aligned and filtred reads | ratio | *.bed* file size | TF input file size | process duration | |
366 | 3c88abd0 | Florent Chuffart | +====+================+===========================+========+==================+====================+==================+ |
367 | 3c88abd0 | Florent Chuffart | | 1 | 16436138 | 10199695 | 62,06% | 1064 Mo | 60 Mo | 383 s. | |
368 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
369 | 3c88abd0 | Florent Chuffart | | 2 | 16911132 | 12512727 | 73,99% | 1298 Mo | 64 Mo | 437 s. | |
370 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
371 | 3c88abd0 | Florent Chuffart | | 3 | 15946902 | 12340426 | 77,38% | 1280 Mo | 65 Mo | 423 s. | |
372 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
373 | 3c88abd0 | Florent Chuffart | | 4 | 13765584 | 10381903 | 75,42% | 931 Mo | 59 Mo | 352 s. | |
374 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
375 | 3c88abd0 | Florent Chuffart | | 5 | 15168268 | 11502855 | 75,83% | 1031 Mo | 64 Mo | 386 s. | |
376 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
377 | 3c88abd0 | Florent Chuffart | | 6 | 18850820 | 14024905 | 74,40% | 1254 Mo | 69 Mo | 482 s. | |
378 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
379 | 3c88abd0 | Florent Chuffart | | 36 | 17715118 | 14092985 | 79,55% | 1404 Mo | 68 Mo | 483 s. | |
380 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
381 | 3c88abd0 | Florent Chuffart | | 37 | 17288466 | 7402082 | 42,82% | 741 Mo | 48 Mo | 339 s. | |
382 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
383 | 3c88abd0 | Florent Chuffart | | 38 | 16116394 | 13178457 | 81,77% | 1101 Mo | 63 Mo | 420 s. | |
384 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
385 | 3c88abd0 | Florent Chuffart | | 39 | 14241106 | 10537228 | 73,99% | 880 Mo | 57 Mo | 348 s. | |
386 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
387 | 3c88abd0 | Florent Chuffart | | 53 | 40876476 | 33780065 | 82,64% | 3316 Mo | 103 Mo | 1165 s. | |
388 | 3c88abd0 | Florent Chuffart | +----+----------------+---------------------------+--------+------------------+--------------------+------------------+ |
389 | 3c88abd0 | Florent Chuffart | |
390 | 3c88abd0 | Florent Chuffart | |
391 | 3c88abd0 | Florent Chuffart | Run TemplateFilter on Mnase Samples |
392 | 3c88abd0 | Florent Chuffart | ----------------------------------- |
393 | 3c88abd0 | Florent Chuffart | |
394 | 3c88abd0 | Florent Chuffart | Finally, for each sample we perform TemplateFilter analysis. |
395 | 3c88abd0 | Florent Chuffart | |
396 | 3c88abd0 | Florent Chuffart | **WARNING** TemplateFilter returns a list of nucleosomes. Each |
397 | 3961deb6 | Florent Chuffart | nucleosome is defined by its center and its width. An odd width leads |
398 | 3c88abd0 | Florent Chuffart | us to consider non- integer lower and upper bound. |
399 | 3c88abd0 | Florent Chuffart | |
400 | 3961deb6 | Florent Chuffart | **WARNING** TemplateFilter was not designed to handle replicates. So |
401 | 3c88abd0 | Florent Chuffart | we recommend to keep a maximum of nucleosomes and filter the aberrant |
402 | 3c88abd0 | Florent Chuffart | ones afterwards using the benefits of having replicates. To do this, |
403 | 3c88abd0 | Florent Chuffart | we set a low correlation threshold parameter (0.5) and a particularly |
404 | 3c88abd0 | Florent Chuffart | high value of overlap (300%). |
405 | 3c88abd0 | Florent Chuffart | |
406 | 3c88abd0 | Florent Chuffart | This step is performed by the following part of the *wf.py* script: |
407 | 3c88abd0 | Florent Chuffart | |
408 | 3c88abd0 | Florent Chuffart | for sample in samples_mnase: |
409 | 3c88abd0 | Florent Chuffart | per_mnase_sample_stats["sample_%s" % sample["id"]] = template_filter(sample, |
410 | 3c88abd0 | Florent Chuffart | config["ALIGN_DIR"], config["LOG_DIR"], config["TF_BIN"], |
411 | 3c88abd0 | Florent Chuffart | config["TF_TEMPLATES_FILE"], config["TF_CORR"], config["TF_MINW"], |
412 | 3c88abd0 | Florent Chuffart | config["TF_MAXW"], config["TF_OL"]) |
413 | 3c88abd0 | Florent Chuffart | |
414 | 3c88abd0 | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
415 | 3c88abd0 | Florent Chuffart | | id | strain | found nucs | nuc file size | process duration | |
416 | 3c88abd0 | Florent Chuffart | +====+========+============+===============+==================+ |
417 | 3c88abd0 | Florent Chuffart | | 1 | BY | 96214 | 68 Mo | 1022 s. | |
418 | 3c88abd0 | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
419 | 3c88abd0 | Florent Chuffart | | 2 | BY | 91694 | 65 Mo | 1038 s. | |
420 | 3c88abd0 | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
421 | 3c88abd0 | Florent Chuffart | | 3 | BY | 91205 | 65 Mo | 1036 s. | |
422 | 3c88abd0 | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
423 | 3c88abd0 | Florent Chuffart | | 4 | RM | 88076 | 62 Mo | 984 s. | |
424 | 3c88abd0 | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
425 | 3c88abd0 | Florent Chuffart | | 5 | RM | 90141 | 64 Mo | 967 s. | |
426 | 3c88abd0 | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
427 | 3c88abd0 | Florent Chuffart | | 6 | RM | 87517 | 62 Mo | 980 s. | |
428 | 3c88abd0 | Florent Chuffart | +----+--------+------------+---------------+------------------+ |
429 | 3c88abd0 | Florent Chuffart | |
430 | 3c88abd0 | Florent Chuffart | |
431 | 3c88abd0 | Florent Chuffart | Inferring Nucleosome Position and Extracting Read Counts |
432 | 3c88abd0 | Florent Chuffart | ======================================================== |
433 | 3c88abd0 | Florent Chuffart | |
434 | 3c88abd0 | Florent Chuffart | The second part of the tutorial uses R |
435 | 3961deb6 | Florent Chuffart | (http://http://www.r-project.org). NucleoMiner2 contains a set of R |
436 | 3961deb6 | Florent Chuffart | scripts that will be sourced in R from a console launched at the root |
437 | 3961deb6 | Florent Chuffart | of your project. These scripts are: |
438 | 3c88abd0 | Florent Chuffart | |
439 | 3c88abd0 | Florent Chuffart | * headers.R |
440 | 3c88abd0 | Florent Chuffart | |
441 | 3c88abd0 | Florent Chuffart | * extract_maps.R |
442 | 3c88abd0 | Florent Chuffart | |
443 | 3c88abd0 | Florent Chuffart | * translate_common_wp.R |
444 | 3c88abd0 | Florent Chuffart | |
445 | 3c88abd0 | Florent Chuffart | * split_samples.R |
446 | 3c88abd0 | Florent Chuffart | |
447 | 3c88abd0 | Florent Chuffart | * count_reads.R |
448 | 3c88abd0 | Florent Chuffart | |
449 | 3c88abd0 | Florent Chuffart | * get_size_factors |
450 | 3c88abd0 | Florent Chuffart | |
451 | 3c88abd0 | Florent Chuffart | * launch_deseq.R |
452 | 3c88abd0 | Florent Chuffart | |
453 | 3c88abd0 | Florent Chuffart | |
454 | 3c88abd0 | Florent Chuffart | The Script headers.R |
455 | 3c88abd0 | Florent Chuffart | -------------------- |
456 | 3c88abd0 | Florent Chuffart | |
457 | 3961deb6 | Florent Chuffart | The script headers.R is included in all other R scripts. It is in |
458 | 3c88abd0 | Florent Chuffart | charge of: |
459 | 3c88abd0 | Florent Chuffart | |
460 | 3c88abd0 | Florent Chuffart | * launching libraries used in the scripts |
461 | 3c88abd0 | Florent Chuffart | |
462 | 3c88abd0 | Florent Chuffart | * launching configuration (design, strain, marker...) |
463 | 3c88abd0 | Florent Chuffart | |
464 | 3961deb6 | Florent Chuffart | * computing and caching Common Uinterrupted Regions (CURs). |
465 | 3961deb6 | Florent Chuffart | Caching means storing the information in the computer's memory. |
466 | 3c88abd0 | Florent Chuffart | |
467 | 3c88abd0 | Florent Chuffart | Note that you can customize the function “translate”. This function |
468 | 3c88abd0 | Florent Chuffart | allows you to use the alignments between genomes when performing |
469 | 3961deb6 | Florent Chuffart | various tasks. |
470 | 3c88abd0 | Florent Chuffart | |
471 | 3961deb6 | Florent Chuffart | * You may want to analyze data of a single strain (e.g. |
472 | 3961deb6 | Florent Chuffart | treatment/control, or only few mutations). In this case, the |
473 | 3961deb6 | Florent Chuffart | genome is identical across all samples and you do not need to |
474 | 3961deb6 | Florent Chuffart | define particular CURs (CURs are chromosomes). Simply use the |
475 | 3c88abd0 | Florent Chuffart | default translate function which is neutral. |
476 | 3c88abd0 | Florent Chuffart | |
477 | 3961deb6 | Florent Chuffart | * If you are analyzing data from two or more strains (as |
478 | 3961deb6 | Florent Chuffart | NucleoMiner2 was designed for), then you need to translate |
479 | 3961deb6 | Florent Chuffart | coordinates of one genome into the coordinates of another one. |
480 | 3961deb6 | Florent Chuffart | You must do this by aligning the two genomes, which will produce |
481 | 3961deb6 | Florent Chuffart | a .c2c file (see Appendice "Generate .c2c Files"). thenuse it to |
482 | 3961deb6 | Florent Chuffart | produce the list of regions and customise “translate”. |
483 | 3c88abd0 | Florent Chuffart | |
484 | 3961deb6 | Florent Chuffart | In our tutorial, we are in the second case and to perform all these |
485 | 3961deb6 | Florent Chuffart | steps run the following command line in your R console: |
486 | 3c88abd0 | Florent Chuffart | |
487 | 3c88abd0 | Florent Chuffart | source("src/current/headers.R") |
488 | 3c88abd0 | Florent Chuffart | |
489 | 3c88abd0 | Florent Chuffart | |
490 | 3c88abd0 | Florent Chuffart | The Script extract_maps.R |
491 | 3c88abd0 | Florent Chuffart | ------------------------- |
492 | 3c88abd0 | Florent Chuffart | |
493 | 3c88abd0 | Florent Chuffart | This script is in charge of extracting Maps for well-positioned and |
494 | 3961deb6 | Florent Chuffart | sensitive nucleosomes. First of all, this script computes intra and |
495 | 3961deb6 | Florent Chuffart | inter-strain matches of nucleosome maps for each CUR. This step can be |
496 | 3961deb6 | Florent Chuffart | executed in parallel on many cores using the BoT library. Next, it |
497 | 3961deb6 | Florent Chuffart | collects results and produces maps of well-positioned nucleosomes, |
498 | 3961deb6 | Florent Chuffart | sensitive nucleosomes and Unaligned Nucleosomal Regions . |
499 | 3961deb6 | Florent Chuffart | |
500 | 3961deb6 | Florent Chuffart | The map of well-positioned nucleosomes for BY is collected in the |
501 | 3961deb6 | Florent Chuffart | result directory and is called *BY_wp.tab*. It is composed of |
502 | 3961deb6 | Florent Chuffart | following columns: |
503 | 3c88abd0 | Florent Chuffart | |
504 | 3c88abd0 | Florent Chuffart | * chr, the number of the chromosome |
505 | 3c88abd0 | Florent Chuffart | |
506 | 3c88abd0 | Florent Chuffart | * lower_bound, the lower bound of the nucleosome |
507 | 3c88abd0 | Florent Chuffart | |
508 | 3c88abd0 | Florent Chuffart | * upper_bound, the upper bound of the nucleosome |
509 | 3c88abd0 | Florent Chuffart | |
510 | 3c88abd0 | Florent Chuffart | * cur_index, index of the CUR |
511 | 3c88abd0 | Florent Chuffart | |
512 | 3c88abd0 | Florent Chuffart | * index_nuc, the index of the nucleosome in the CUR |
513 | 3c88abd0 | Florent Chuffart | |
514 | 3c88abd0 | Florent Chuffart | * wp, 1 if it is a well positioned nucleosome, 0 otherwise |
515 | 3c88abd0 | Florent Chuffart | |
516 | 3c88abd0 | Florent Chuffart | * nb_reads, the number of reads that support this nucleosome |
517 | 3c88abd0 | Florent Chuffart | |
518 | 3c88abd0 | Florent Chuffart | * nb_nucs, the number of TemplateFilter nucleosome across |
519 | 3c88abd0 | Florent Chuffart | replicates (= the number of replicates in which it is a well- |
520 | 3c88abd0 | Florent Chuffart | positioned nucleosome) |
521 | 3c88abd0 | Florent Chuffart | |
522 | 3c88abd0 | Florent Chuffart | * llr_1, for a well-positioned nucleosome, it is the LLR1 (log- |
523 | 3c88abd0 | Florent Chuffart | likelihood ratio) between the first and the second TemplateFilter |
524 | 3c88abd0 | Florent Chuffart | nucleosome on the chain. |
525 | 3c88abd0 | Florent Chuffart | |
526 | 3c88abd0 | Florent Chuffart | * llr_2, for a well-positioned nucleosome, it is the LLR1 between |
527 | 3c88abd0 | Florent Chuffart | the second and the third TemplateFilter nucleosome on the chain. |
528 | 3c88abd0 | Florent Chuffart | |
529 | 3c88abd0 | Florent Chuffart | * wp_llr, for a well-positioned nucleosome, it is the LLR2 that |
530 | 3c88abd0 | Florent Chuffart | compares consistency of the positioning over all TemplateFilter |
531 | 3c88abd0 | Florent Chuffart | nucleosomes. |
532 | 3c88abd0 | Florent Chuffart | |
533 | 3c88abd0 | Florent Chuffart | * wp_pval, for a well-positioned nucleosome, it is the p-value |
534 | 3961deb6 | Florent Chuffart | chi square test obtained from LLR2 (*1-pchisq(2.LLR2, df=4)*) |
535 | 3c88abd0 | Florent Chuffart | |
536 | 3c88abd0 | Florent Chuffart | * dyad_shift, for a well-positioned nucleosome, it is the shift |
537 | 3c88abd0 | Florent Chuffart | between the two extreme TemplateFilter nucleosome dyad positions. |
538 | 3c88abd0 | Florent Chuffart | |
539 | 3961deb6 | Florent Chuffart | The sensitive map for BY is collected in the result directory and is |
540 | 3c88abd0 | Florent Chuffart | called *BY_fuzzy.tab*. It is composed of following columns: |
541 | 3c88abd0 | Florent Chuffart | |
542 | 3c88abd0 | Florent Chuffart | * chr, the number of the chromosome |
543 | 3c88abd0 | Florent Chuffart | |
544 | 3c88abd0 | Florent Chuffart | * lower_bound, the lower bound of the nucleosome |
545 | 3c88abd0 | Florent Chuffart | |
546 | 3c88abd0 | Florent Chuffart | * upper_bound, the upper bound of the nucleosome |
547 | 3c88abd0 | Florent Chuffart | |
548 | 3c88abd0 | Florent Chuffart | * cur_index, index of the CUR |
549 | 3c88abd0 | Florent Chuffart | |
550 | 3c88abd0 | Florent Chuffart | The map of common well-positioned nucleosomes aligned between the BY |
551 | 3c88abd0 | Florent Chuffart | and RM strains is collected in the result directory and is called |
552 | 3c88abd0 | Florent Chuffart | *BY_RM_common_wp.tab*. It is composed of following columns: |
553 | 3c88abd0 | Florent Chuffart | |
554 | 3c88abd0 | Florent Chuffart | * cur_index, the index of the CUR |
555 | 3c88abd0 | Florent Chuffart | |
556 | 3c88abd0 | Florent Chuffart | * index_nuc_BY, the index of the BY nucleosome in the CUR |
557 | 3c88abd0 | Florent Chuffart | |
558 | 3c88abd0 | Florent Chuffart | * index_nuc_RM, the index of the RM nucleosome in the CUR |
559 | 3c88abd0 | Florent Chuffart | |
560 | 3c88abd0 | Florent Chuffart | * llr_score, , the LLR3 score that estimates conservation between |
561 | 3c88abd0 | Florent Chuffart | the positions in BY and RM |
562 | 3c88abd0 | Florent Chuffart | |
563 | 3c88abd0 | Florent Chuffart | * common_wp_pval, the p-value chi square test obtained from LLR3 |
564 | 3c88abd0 | Florent Chuffart | (*1-pchisq(2.LLR3, df=2)*) |
565 | 3c88abd0 | Florent Chuffart | |
566 | 3c88abd0 | Florent Chuffart | * diff, the dyads shift between the positions in the two strains |
567 | 3961deb6 | Florent Chuffart | (in bp) |
568 | 3c88abd0 | Florent Chuffart | |
569 | 3c88abd0 | Florent Chuffart | The common UNR map for BY and RM strains is collected in the result |
570 | 3c88abd0 | Florent Chuffart | directory and is called *BY_RM_common_unr.tab*. It is composed of the |
571 | 3c88abd0 | Florent Chuffart | following columns: |
572 | 3c88abd0 | Florent Chuffart | |
573 | 3c88abd0 | Florent Chuffart | * cur_index, the index of the CUR |
574 | 3c88abd0 | Florent Chuffart | |
575 | 3c88abd0 | Florent Chuffart | * index_nuc_BY, the index of the BY nucleosome in the CUR |
576 | 3c88abd0 | Florent Chuffart | |
577 | 3c88abd0 | Florent Chuffart | * index_nuc_RM,the index of the RM nucleosome in the CUR |
578 | 3c88abd0 | Florent Chuffart | |
579 | 3c88abd0 | Florent Chuffart | To execute this script, run the following command in your R console: |
580 | 3c88abd0 | Florent Chuffart | |
581 | 3c88abd0 | Florent Chuffart | source("src/current/extract_maps.R") |
582 | 3c88abd0 | Florent Chuffart | |
583 | 3c88abd0 | Florent Chuffart | |
584 | 3c88abd0 | Florent Chuffart | The Script translate_common_wp.R |
585 | 3c88abd0 | Florent Chuffart | -------------------------------- |
586 | 3c88abd0 | Florent Chuffart | |
587 | 3c88abd0 | Florent Chuffart | This script is used to translate common well-positioned nucleosome |
588 | 3961deb6 | Florent Chuffart | positions from a strain to another strain and stores it into a table. |
589 | 3c88abd0 | Florent Chuffart | |
590 | 3c88abd0 | Florent Chuffart | For example, the file *results/2014-04/RM_wp_tr_2_BY.tab* contains RM |
591 | 3961deb6 | Florent Chuffart | well-positioned nucleosomes translated into the BY genome coordinates. |
592 | 3c88abd0 | Florent Chuffart | It is composed of following columns: |
593 | 3c88abd0 | Florent Chuffart | |
594 | 3c88abd0 | Florent Chuffart | * strain_ref, the reference genome (in which positioned are |
595 | 3c88abd0 | Florent Chuffart | defined) |
596 | 3c88abd0 | Florent Chuffart | |
597 | 3c88abd0 | Florent Chuffart | * begin, the translated lower bound of the nucleosome |
598 | 3c88abd0 | Florent Chuffart | |
599 | 3c88abd0 | Florent Chuffart | * end, the translated upper bound of the nucleosome |
600 | 3c88abd0 | Florent Chuffart | |
601 | 3c88abd0 | Florent Chuffart | * chr, the number of chromosomes for the reference genome (in |
602 | 3c88abd0 | Florent Chuffart | which positioned are defined) |
603 | 3c88abd0 | Florent Chuffart | |
604 | 3c88abd0 | Florent Chuffart | * length, the length of the nucleosome (could be negative) |
605 | 3c88abd0 | Florent Chuffart | |
606 | 3c88abd0 | Florent Chuffart | * cur_index, the index of the CUR |
607 | 3c88abd0 | Florent Chuffart | |
608 | 3c88abd0 | Florent Chuffart | * index_nuc, the index of the nucleosome in the CUR |
609 | 3c88abd0 | Florent Chuffart | |
610 | 3c88abd0 | Florent Chuffart | To execute this script, run the following command in your R console: |
611 | 3c88abd0 | Florent Chuffart | |
612 | 3c88abd0 | Florent Chuffart | source("src/current/translate_common_wp.R") |
613 | 3c88abd0 | Florent Chuffart | |
614 | 3c88abd0 | Florent Chuffart | |
615 | 3c88abd0 | Florent Chuffart | The Script split_samples.R |
616 | 3c88abd0 | Florent Chuffart | -------------------------- |
617 | 3c88abd0 | Florent Chuffart | |
618 | 3961deb6 | Florent Chuffart | To optimize memory space usage, we split and compress TemplateFilter |
619 | 3c88abd0 | Florent Chuffart | input files according to their corresponding chromosome. for example, |
620 | 3c88abd0 | Florent Chuffart | *sample_1_TF.tab* will be split into : |
621 | 3c88abd0 | Florent Chuffart | |
622 | 3c88abd0 | Florent Chuffart | * sample_1_chr_1_splited_sample.tab.gz |
623 | 3c88abd0 | Florent Chuffart | |
624 | 3c88abd0 | Florent Chuffart | * sample_1_chr_2_splited_sample.tab.gz |
625 | 3c88abd0 | Florent Chuffart | |
626 | 3c88abd0 | Florent Chuffart | * ... |
627 | 3c88abd0 | Florent Chuffart | |
628 | 3c88abd0 | Florent Chuffart | * sample_1_chr_17_splited_sample.tab.gz |
629 | 3c88abd0 | Florent Chuffart | |
630 | 3c88abd0 | Florent Chuffart | To execute this script, run the following command in your R console: |
631 | 3c88abd0 | Florent Chuffart | |
632 | 3c88abd0 | Florent Chuffart | source("src/current/split_samples.R") |
633 | 3c88abd0 | Florent Chuffart | |
634 | 3c88abd0 | Florent Chuffart | |
635 | 3c88abd0 | Florent Chuffart | The Script count_reads.R |
636 | 3c88abd0 | Florent Chuffart | ------------------------ |
637 | 3c88abd0 | Florent Chuffart | |
638 | 3c88abd0 | Florent Chuffart | To associate a number of observations (read) to each nucleosome we run |
639 | 3c88abd0 | Florent Chuffart | the script *count_reads.R*. It produces the files |
640 | 3c88abd0 | Florent Chuffart | *BY_RM_H3K14ac_wp_and_nbreads.tab*, |
641 | 3c88abd0 | Florent Chuffart | *BY_RM_H3K14ac_unr_and_nbreads.tab* |
642 | 3c88abd0 | Florent Chuffart | *BY_RM_Mnase_Seq_wp_and_nbreads.tab* and |
643 | 3c88abd0 | Florent Chuffart | *BY_RM_Mnase_Seq_unr_and_nbreads.tab* for H3K14ac common well- |
644 | 3c88abd0 | Florent Chuffart | positioned nucleosomes, H3K14ac UNRs, Mnase common well-positioned |
645 | 3c88abd0 | Florent Chuffart | nucleosomes and Mnase UNRs respectively. |
646 | 3c88abd0 | Florent Chuffart | |
647 | 3c88abd0 | Florent Chuffart | For example, the file *BY_RM_H3K14ac_unr_and_nbreads.tab* contains |
648 | 3c88abd0 | Florent Chuffart | counted reads for well-positioned nucleosomes with the experimental |
649 | 3c88abd0 | Florent Chuffart | condition ChIP H3K14ac. It is composed of the following columns: |
650 | 3c88abd0 | Florent Chuffart | |
651 | 3c88abd0 | Florent Chuffart | * chr_BY, the number of the chromosome for BY |
652 | 3c88abd0 | Florent Chuffart | |
653 | 3c88abd0 | Florent Chuffart | * lower_bound_BY, the lower bound of the nucleosome for BY |
654 | 3c88abd0 | Florent Chuffart | |
655 | 3c88abd0 | Florent Chuffart | * upper_bound_BY, the upper bound of the nucleosome for BY |
656 | 3c88abd0 | Florent Chuffart | |
657 | 3c88abd0 | Florent Chuffart | * index_nuc_BY, the index of the BY nucleosome in the CUR for BY |
658 | 3c88abd0 | Florent Chuffart | |
659 | 3c88abd0 | Florent Chuffart | * chr_RM, the number of the chromosome for RM |
660 | 3c88abd0 | Florent Chuffart | |
661 | 3c88abd0 | Florent Chuffart | * lower_bound_RM, the lower bound of the nucleosome for RM |
662 | 3c88abd0 | Florent Chuffart | |
663 | 3c88abd0 | Florent Chuffart | * upper_bound_RM, the upper bound of the nucleosome for RM |
664 | 3c88abd0 | Florent Chuffart | |
665 | 3c88abd0 | Florent Chuffart | * index_nuc_RM,the index of the RM nucleosome in the CUR for RM |
666 | 3c88abd0 | Florent Chuffart | |
667 | 3c88abd0 | Florent Chuffart | * cur_index, index of the CUR |
668 | 3c88abd0 | Florent Chuffart | |
669 | 3c88abd0 | Florent Chuffart | * BY_H3K14ac_36, the number of reads for the current nucleosome |
670 | 3c88abd0 | Florent Chuffart | for the sample 36 |
671 | 3c88abd0 | Florent Chuffart | |
672 | 3c88abd0 | Florent Chuffart | * BY_H3K14ac_37, #reads for sample 37 |
673 | 3c88abd0 | Florent Chuffart | |
674 | 3c88abd0 | Florent Chuffart | * BY_H3K14ac_53, #reads for sample 53 |
675 | 3c88abd0 | Florent Chuffart | |
676 | 3c88abd0 | Florent Chuffart | * RM_H3K14ac_38, #reads for sample 38 |
677 | 3c88abd0 | Florent Chuffart | |
678 | 3c88abd0 | Florent Chuffart | * RM_H3K14ac_39, #reads for sample 39 |
679 | 3c88abd0 | Florent Chuffart | |
680 | 3c88abd0 | Florent Chuffart | To execute this script, run the following command in your R console: |
681 | 3c88abd0 | Florent Chuffart | |
682 | 3c88abd0 | Florent Chuffart | source("src/current/count_reads.R") |
683 | 3c88abd0 | Florent Chuffart | |
684 | 3c88abd0 | Florent Chuffart | |
685 | 3c88abd0 | Florent Chuffart | The Script get_size_factors.R |
686 | 3c88abd0 | Florent Chuffart | ----------------------------- |
687 | 3c88abd0 | Florent Chuffart | |
688 | 3c88abd0 | Florent Chuffart | This script uses the DESeq function *estimateSizeFactors* to compute |
689 | 3c88abd0 | Florent Chuffart | the size factor of each sample. It corresponds to normalisation of |
690 | 3c88abd0 | Florent Chuffart | read counts from sample to sample, as determined by DESeq. When a |
691 | 3c88abd0 | Florent Chuffart | sample has n reads for a nucleosome or a UNR, the normalised count is |
692 | 3c88abd0 | Florent Chuffart | n/f where f is the factor contained in this file. The script dumps |
693 | 3c88abd0 | Florent Chuffart | computed size factors into the file *size_factors.tab*. This file has |
694 | 3c88abd0 | Florent Chuffart | the form: |
695 | 3c88abd0 | Florent Chuffart | |
696 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
697 | 3c88abd0 | Florent Chuffart | | sample_id | wp | unr | wpunr | |
698 | 3c88abd0 | Florent Chuffart | +===========+=========+=========+=========+ |
699 | 3c88abd0 | Florent Chuffart | | 1 | 0.87396 | 0.88097 | 0.87584 | |
700 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
701 | 3c88abd0 | Florent Chuffart | | 2 | 1.07890 | 1.07440 | 1.07760 | |
702 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
703 | 3c88abd0 | Florent Chuffart | | 3 | 1.06400 | 1.05890 | 1.06250 | |
704 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
705 | 3c88abd0 | Florent Chuffart | | 4 | 0.85782 | 0.87948 | 0.86305 | |
706 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
707 | 3c88abd0 | Florent Chuffart | | 5 | 0.97577 | 0.96590 | 0.97307 | |
708 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
709 | 3c88abd0 | Florent Chuffart | | 6 | 1.19630 | 1.18120 | 1.19190 | |
710 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
711 | 3c88abd0 | Florent Chuffart | | 36 | 0.93318 | 0.92762 | 0.93166 | |
712 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
713 | 3c88abd0 | Florent Chuffart | | 37 | 0.48315 | 0.48453 | 0.48350 | |
714 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
715 | 3c88abd0 | Florent Chuffart | | 38 | 1.11240 | 1.11210 | 1.11230 | |
716 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
717 | 3c88abd0 | Florent Chuffart | | 39 | 0.89897 | 0.89917 | 0.89903 | |
718 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
719 | 3c88abd0 | Florent Chuffart | | 53 | 2.22650 | 2.22700 | 2.22660 | |
720 | 3c88abd0 | Florent Chuffart | +-----------+---------+---------+---------+ |
721 | 3c88abd0 | Florent Chuffart | |
722 | 3c88abd0 | Florent Chuffart | sample_id are given in file samples.csv |
723 | 3c88abd0 | Florent Chuffart | |
724 | 3961deb6 | Florent Chuffart | If you don't know which column to use for normalization, we recommend |
725 | 3961deb6 | Florent Chuffart | using wpunr. |
726 | 3c88abd0 | Florent Chuffart | |
727 | 3961deb6 | Florent Chuffart | Here are the details of the factors produced: |
728 | 3c88abd0 | Florent Chuffart | |
729 | 3c88abd0 | Florent Chuffart | * unr: factor computed from data of UNR regions. These regions |
730 | 3c88abd0 | Florent Chuffart | are defined for every pairs of aligned genomes (e.g. BY_RM) |
731 | 3c88abd0 | Florent Chuffart | |
732 | 3c88abd0 | Florent Chuffart | * wp: same, but for well-positioned nucleosomes. |
733 | 3c88abd0 | Florent Chuffart | |
734 | 3c88abd0 | Florent Chuffart | * wpunr: both types of regions. |
735 | 3c88abd0 | Florent Chuffart | |
736 | 3c88abd0 | Florent Chuffart | To execute this script, run the following command in your R console: |
737 | 3c88abd0 | Florent Chuffart | |
738 | 3c88abd0 | Florent Chuffart | source("src/current/get_size_factors.R") |
739 | 3c88abd0 | Florent Chuffart | |
740 | 3c88abd0 | Florent Chuffart | |
741 | 3c88abd0 | Florent Chuffart | The Script launch_deseq.R |
742 | 3c88abd0 | Florent Chuffart | ------------------------- |
743 | 3c88abd0 | Florent Chuffart | |
744 | 3c88abd0 | Florent Chuffart | Finally, the script *launch_deseq.R* perform statistical analysis on |
745 | 3c88abd0 | Florent Chuffart | each nucleosome using *DESeq*. It produces files: |
746 | 3c88abd0 | Florent Chuffart | |
747 | 3c88abd0 | Florent Chuffart | * results/current/BY_RM_H3K14ac_wp_snep.tab |
748 | 3c88abd0 | Florent Chuffart | |
749 | 3c88abd0 | Florent Chuffart | * results/current/BY_RM_H3K14ac_unr_snep.tab |
750 | 3c88abd0 | Florent Chuffart | |
751 | 3c88abd0 | Florent Chuffart | * results/current/BY_RM_H3K14ac_wpunr_snep.tab |
752 | 3c88abd0 | Florent Chuffart | |
753 | 3c88abd0 | Florent Chuffart | * results/current/BY_RM_H3K14ac_wp_mnase.tab |
754 | 3c88abd0 | Florent Chuffart | |
755 | 3c88abd0 | Florent Chuffart | * results/current/BY_RM_H3K14ac_unr_mnase.tab |
756 | 3c88abd0 | Florent Chuffart | |
757 | 3c88abd0 | Florent Chuffart | * results/current/BY_RM_H3K14ac_wpunr_mnase.tab |
758 | 3c88abd0 | Florent Chuffart | |
759 | 3c88abd0 | Florent Chuffart | These files are organised with the following columns (see file |
760 | 3c88abd0 | Florent Chuffart | *BY_RM_H3K14ac_wp_snep.tab* for an example): |
761 | 3c88abd0 | Florent Chuffart | |
762 | 3c88abd0 | Florent Chuffart | * chr_BY, the number of the chromosome for BY |
763 | 3c88abd0 | Florent Chuffart | |
764 | 3c88abd0 | Florent Chuffart | * lower_bound_BY, the lower bound of the nucleosome for BY |
765 | 3c88abd0 | Florent Chuffart | |
766 | 3c88abd0 | Florent Chuffart | * upper_bound_BY, the upper bound of the nucleosome for BY |
767 | 3c88abd0 | Florent Chuffart | |
768 | 3c88abd0 | Florent Chuffart | * index_nuc_BY, the index of the BY nucleosome in the CUR for BY |
769 | 3c88abd0 | Florent Chuffart | |
770 | 3c88abd0 | Florent Chuffart | * chr_RM, the number of the chromosome for RM |
771 | 3c88abd0 | Florent Chuffart | |
772 | 3c88abd0 | Florent Chuffart | * lower_bound_RM, the lower bound of the nucleosome for RM |
773 | 3c88abd0 | Florent Chuffart | |
774 | 3c88abd0 | Florent Chuffart | * upper_bound_RM, the upper bound of the nucleosome for RM |
775 | 3c88abd0 | Florent Chuffart | |
776 | 3c88abd0 | Florent Chuffart | * index_nuc_RM,the index of the RM nucleosome in the CUR for RM |
777 | 3c88abd0 | Florent Chuffart | |
778 | 3c88abd0 | Florent Chuffart | * cur_index, index of the CUR |
779 | 3c88abd0 | Florent Chuffart | |
780 | 3c88abd0 | Florent Chuffart | * form |
781 | 3c88abd0 | Florent Chuffart | |
782 | 3c88abd0 | Florent Chuffart | * BY_Mnase_Seq_1, the number of reads for the current nucleosome |
783 | 3c88abd0 | Florent Chuffart | for the sample 1 |
784 | 3c88abd0 | Florent Chuffart | |
785 | 3c88abd0 | Florent Chuffart | Next columns concern indicators for each sample: |
786 | 3c88abd0 | Florent Chuffart | |
787 | 3c88abd0 | Florent Chuffart | * BY_Mnase_Seq_2, #reads for sample 2 |
788 | 3c88abd0 | Florent Chuffart | |
789 | 3c88abd0 | Florent Chuffart | * BY_Mnase_Seq_3, #reads for sample 3 |
790 | 3c88abd0 | Florent Chuffart | |
791 | 3c88abd0 | Florent Chuffart | * RM_Mnase_Seq_4, #reads for sample 4 |
792 | 3c88abd0 | Florent Chuffart | |
793 | 3c88abd0 | Florent Chuffart | * RM_Mnase_Seq_5, #reads for sample 5 |
794 | 3c88abd0 | Florent Chuffart | |
795 | 3c88abd0 | Florent Chuffart | * RM_Mnase_Seq_6, #reads for sample 6 |
796 | 3c88abd0 | Florent Chuffart | |
797 | 3c88abd0 | Florent Chuffart | * BY_H3K14ac_36, #reads for sample 36 |
798 | 3c88abd0 | Florent Chuffart | |
799 | 3c88abd0 | Florent Chuffart | * BY_H3K14ac_37, #reads for sample 37 |
800 | 3c88abd0 | Florent Chuffart | |
801 | 3c88abd0 | Florent Chuffart | * BY_H3K14ac_53, #reads for sample 53 |
802 | 3c88abd0 | Florent Chuffart | |
803 | 3c88abd0 | Florent Chuffart | * RM_H3K14ac_38, #reads for sample 38 |
804 | 3c88abd0 | Florent Chuffart | |
805 | 3c88abd0 | Florent Chuffart | * RM_H3K14ac_39, #reads for sample 39 |
806 | 3c88abd0 | Florent Chuffart | |
807 | 3c88abd0 | Florent Chuffart | The 5 last columns concern DESeq analysis: |
808 | 3c88abd0 | Florent Chuffart | |
809 | 3c88abd0 | Florent Chuffart | * manip[a_manip] strain[a_strain] |
810 | 3c88abd0 | Florent Chuffart | manip[a_strain]:strain[a_strain], the manip (marker) effect, the |
811 | 3c88abd0 | Florent Chuffart | strain effect and the snep effect. These are the coefficients of |
812 | 3c88abd0 | Florent Chuffart | the fitted generalized linear model. |
813 | 3c88abd0 | Florent Chuffart | |
814 | 3961deb6 | Florent Chuffart | * pvalsGLM, the pvalue resulting from the comparison of the GLM |
815 | 3961deb6 | Florent Chuffart | model considering the interaction term *marker:strain* to the GLM |
816 | 3961deb6 | Florent Chuffart | model that does not consider it. This is the statsitcial |
817 | 3961deb6 | Florent Chuffart | significance of the interaction term and therefore the |
818 | 3961deb6 | Florent Chuffart | statistical significance of the SNEP. |
819 | 3c88abd0 | Florent Chuffart | |
820 | 3c88abd0 | Florent Chuffart | * snep_index, a boolean set to TRUE if the pvalueGLM value is |
821 | 3c88abd0 | Florent Chuffart | under the threshold computed with FDR function with a rate set to |
822 | 3c88abd0 | Florent Chuffart | 0.0001. |
823 | 3c88abd0 | Florent Chuffart | |
824 | 3c88abd0 | Florent Chuffart | To execute this script, run the following command in your R console: |
825 | 3c88abd0 | Florent Chuffart | |
826 | 3c88abd0 | Florent Chuffart | source("src/current/launch_deseq.R") |
827 | 3c88abd0 | Florent Chuffart | |
828 | 3c88abd0 | Florent Chuffart | |
829 | 3c88abd0 | Florent Chuffart | Results: Number of SNEPs |
830 | 3c88abd0 | Florent Chuffart | ======================== |
831 | 3c88abd0 | Florent Chuffart | |
832 | 3c88abd0 | Florent Chuffart | Here are the number of computed SNEPs for each forms. |
833 | 3c88abd0 | Florent Chuffart | |
834 | 3c88abd0 | Florent Chuffart | +-------+---------+-------+---------+ |
835 | 3c88abd0 | Florent Chuffart | | form | strains | #nucs | H3K14ac | |
836 | 3c88abd0 | Florent Chuffart | +=======+=========+=======+=========+ |
837 | 3c88abd0 | Florent Chuffart | | wp | BY-RM | 30464 | 3549 | |
838 | 3c88abd0 | Florent Chuffart | +-------+---------+-------+---------+ |
839 | 3c88abd0 | Florent Chuffart | | unr | BY-RM | 9497 | 1559 | |
840 | 3c88abd0 | Florent Chuffart | +-------+---------+-------+---------+ |
841 | 3c88abd0 | Florent Chuffart | | wpunr | BY-RM | 39961 | 5240 | |
842 | 3c88abd0 | Florent Chuffart | +-------+---------+-------+---------+ |
843 | 3c88abd0 | Florent Chuffart | |
844 | 3c88abd0 | Florent Chuffart | |
845 | 3c88abd0 | Florent Chuffart | APPENDICE: Generate .c2c Files |
846 | 3c88abd0 | Florent Chuffart | ============================== |
847 | 3c88abd0 | Florent Chuffart | |
848 | 3961deb6 | Florent Chuffart | The *.c2c* files is a simple table that describes how two genome |
849 | 3961deb6 | Florent Chuffart | sequences are aligned. This file can be generated by using scripts |
850 | 3961deb6 | Florent Chuffart | that were developed in NucleoMiner 1.0 (Nagarajan et al. PLoS Genetics |
851 | 3961deb6 | Florent Chuffart | 2010) and which we provide in this release of NucleoMiner2. |
852 | 3c88abd0 | Florent Chuffart | |
853 | 3961deb6 | Florent Chuffart | To use these scripts on your UNIX/LINUX computer you need first to |
854 | 3961deb6 | Florent Chuffart | install MUMmer which is designed to rapidly align entire genomes, |
855 | 3961deb6 | Florent Chuffart | whether in complete or draft form. |
856 | 3c88abd0 | Florent Chuffart | |
857 | 3c88abd0 | Florent Chuffart | |
858 | 3961deb6 | Florent Chuffart | Installing MUMmer |
859 | 3961deb6 | Florent Chuffart | ----------------- |
860 | 3c88abd0 | Florent Chuffart | |
861 | c25275e2 | Florent Chuffart | Get the last version of MUMmer archive on your computer |
862 | 3961deb6 | Florent Chuffart | (MUMmer3.23.tar.gz is provided in the directory deps of your working |
863 | c25275e2 | Florent Chuffart | directory). Copy it in a dedicated directory. Install it locally into |
864 | c25275e2 | Florent Chuffart | the src folder of you working directory by typing (working directory): |
865 | 3c88abd0 | Florent Chuffart | |
866 | 3961deb6 | Florent Chuffart | tar -xvzf MUMmer3.23.tar.gz |
867 | 3c88abd0 | Florent Chuffart | |
868 | c25275e2 | Florent Chuffart | cd src |
869 | c25275e2 | Florent Chuffart | tar xfvz ../deps/MUMmer3.23.tar.gz |
870 | c25275e2 | Florent Chuffart | cd MUMmer3.23 |
871 | c25275e2 | Florent Chuffart | make check |
872 | c25275e2 | Florent Chuffart | make install |
873 | 3c88abd0 | Florent Chuffart | |
874 | 3c88abd0 | Florent Chuffart | |
875 | c25275e2 | Florent Chuffart | Installing NucleoMiner 1.0 scripts |
876 | c25275e2 | Florent Chuffart | ---------------------------------- |
877 | 3c88abd0 | Florent Chuffart | |
878 | 3961deb6 | Florent Chuffart | Get the nucleominer-1.0.tar.gz archive on your computer (this archive |
879 | 3961deb6 | Florent Chuffart | is provided in the directory deps of your working directory). Install |
880 | 3961deb6 | Florent Chuffart | it locally into the src folder of you working directory by typing |
881 | 3961deb6 | Florent Chuffart | (working directory): |
882 | 3c88abd0 | Florent Chuffart | |
883 | c25275e2 | Florent Chuffart | cd src |
884 | c25275e2 | Florent Chuffart | tar xfvz ../deps/nucleominer-1.0.tar.gz |
885 | c25275e2 | Florent Chuffart | cd .. |
886 | 3c88abd0 | Florent Chuffart | |
887 | 3961deb6 | Florent Chuffart | This creates a directory that contains NucleoMiner 1.0 scripts |
888 | c25275e2 | Florent Chuffart | (src/nucleominer-1.0/scripts). |
889 | 3c88abd0 | Florent Chuffart | |
890 | 3c88abd0 | Florent Chuffart | |
891 | 3c88abd0 | Florent Chuffart | Generate .c2c Files |
892 | 3c88abd0 | Florent Chuffart | ------------------- |
893 | 3c88abd0 | Florent Chuffart | |
894 | 3c88abd0 | Florent Chuffart | To generate .c2c files you need to type the following command in a |
895 | 3c88abd0 | Florent Chuffart | terminal: |
896 | 3c88abd0 | Florent Chuffart | |
897 | c25275e2 | Florent Chuffart | export PATH=$PATH:src/MUMmer3.23:src/nucleominer-1.0/scripts |
898 | c25275e2 | Florent Chuffart | export PERL5LIB=$PERL5LIB:src/nucleominer-1.0/scripts/ |
899 | c25275e2 | Florent Chuffart | NMgxcomp data/saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta \ |
900 | c25275e2 | Florent Chuffart | data/saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta \ |
901 | c25275e2 | Florent Chuffart | data/byxrm 2>NMgxcomp.log |
902 | 3c88abd0 | Florent Chuffart | |
903 | c25275e2 | Florent Chuffart | After execution, the directory *data* will hold the .c2c files. |