Révision e5603c3f

b/README
3 3
*****************************************
4 4

  
5 5
*NucleoMiner2* offers Python API and R package allowing to perform
6
quantitative analysis of nucleosomal epigenome. It is especially well
7
suited for scripting to extract natural Single-Nucleosome Epi-
8
Polymorphisms (SNEP) from ChIP-Seq data.
6
quantitative analysis of epigenetic marks on individual nucleosomes.
7
It was developed to detect natural Single-Nucleosome Epi-Polymorphisms
8
(SNEP) from MNase-seq and ChIP-seq data.
9 9

  
10 10

  
11 11
License
......
35 35
of the economic rights,  and the successive licensors  have only
36 36
limited liability.
37 37

  
38
In this respect, the user's attention is drawn to the risks associated
39
with loading,  using,  modifying and/or developing or reproducing the
40
software by the user in light of its specific status of free software,
41
that may mean  that it is complicated to manipulate,  and  that  also
42
therefore means  that it is reserved for developers  and  experienced
43
professionals having in-depth computer knowledge. Users are therefore
44
encouraged to load and test the software's suitability as regards
45
their requirements in conditions enabling the security of their
46
systems and/or data to be ensured and,  more generally, to use and
47
operate it in the same conditions as regards security.
38
This software is provided with absolutely NO WARRANTY. The authors can
39
not be held responsible, even partially, for any damage, loss,
40
financial loss or any other undesired facts resulting from the use of
41
the software. In this respect, the user's attention is drawn to the
42
risks associated with loading,  using,  modifying and/or developing or
43
reproducing the software by the user in light of its specific status
44
of free software, that may mean  that it is complicated to manipulate,
45
and  that  also therefore means  that it is reserved for developers
46
and  experienced professionals having in-depth computer knowledge.
47
Users are therefore encouraged to load and test the software's
48
suitability as regards their requirements in conditions enabling the
49
security of their systems and/or data to be ensured and,  more
50
generally, to use and operate it in the same conditions as regards
51
security.
48 52

  
49 53
The fact that you are presently reading this means that you have had
50 54
knowledge of the CeCILL license and that you accept its terms.
......
57 61
Links
58 62
-----
59 63

  
60
*NucleoMiner2* home page and documentation: https://forge.cbp.ens-
61
lyon.fr/redmine/projects/nucleominer
64
*NucleoMiner2* home page and documentation are available here:
62 65

  
63
Gael Yvert lab page: http://www.ens-lyon.fr/LBMC/gisv/
66
   * https://forge.cbp.ens-lyon.fr/redmine/projects/nucleominer
67

  
68
The Yvert lab web page is accessible here:
69

  
70
   * http://www.ens-lyon.fr/LBMC/gisv/
64 71

  
65 72

  
66 73
Installation
67 74
------------
68 75

  
69
   * Download archive
70 76

  
71
   * Compile bowtie2
77
Prerequisites
78
~~~~~~~~~~~~~
79

  
80
To work properly, NucleoMiner2 needs that the following free software
81
are installed and made available on your system:
82

  
83
   * Bowtie2 http://bowtie-bio.sourceforge.net/bowtie2
72 84

  
73
   * Compile samtools
85
   * SAMtools http://samtools.sourceforge.net
74 86

  
75
   * Compile bedtools
87
   * bedtools http://code.google.com/p/bedtools/
76 88

  
77
   * Compile TemplateFilter
89
   * TemplateFilter
90
     http://compbio.cs.huji.ac.il/NucPosition/TemplateFiltering
78 91

  
79
Required R packages:
80
   * bot
92
It also requires the following R packages to be installed on your
93
system:
81 94

  
82 95
   * fork
83 96

  
......
85 98

  
86 99
   * seqinr
87 100

  
88
   * cachecache
89

  
90
   cd src/r_packages/
91
         tar xfvz R-latest.tar.gz
92
         cd R-patched
93
         ./configure --with-x=no PDFLATEX="ls"
94
         make
95
   cd ../../..
96
   R_BIN=src/r_packages/R-patched/bin/R
97
         $R_BIN CMD INSTALL src/r_packages/rjson_0.2.12.tar.gz
98
         $R_BIN CMD INSTALL src/r_packages/seqinr_3.0-7.tar.gz
99
         $R_BIN CMD INSTALL src/r_packages/plotrix_3.4-5.tar.gz
100
         $R_BIN CMD INSTALL src/r_packages/nm_2.0.tar.gz
101
         $R_BIN CMD INSTALL src/r_packages/fork_1.2.4.tar.gz
102
         $R_BIN CMD INSTALL src/r_packages/bot_0.9.tar.gz
103
         $R_BIN CMD INSTALL src/r_packages/DESeq_1.14.0.tar.gz
104

  
105
...
106

  
101
   * cachecache https://forge.cbp.ens-
102
     lyon.fr/redmine/projects/cachecache
107 103

  
108
usage
109
=====
104
   * bot https://forge.cbp.ens-lyon.fr/redmine/projects/bot
110 105

  
111
See html documentation for *NucleoMiner2*: http://www.ens-
112
lyon.fr/LBMC/gisv/
106
   * nucleominer https://forge.cbp.ens-
107
     lyon.fr/redmine/projects/nucleominer
/dev/null
1

  
2
Welcome to *NucleoMiner2*
3
*************************
4

  
5
* Readme / Documentation for *NucleoMiner2*
6

  
7
  * License
8

  
9
  * Installation Instructions
10

  
11
  * usage
12

  
13
* Tutorial
14

  
15
  * Python and R Common Configuration File
16

  
17
  * Dataset and Configuration Variables
18

  
19
  * Preprocessing Illumina Fastq Reads for Each Sample
20

  
21
  * Inferring Nucleosome Position and Extracting Read Counts
22

  
23
  * Results
24

  
25
* References
26

  
27
  * Python Reference
28

  
29
  * R Reference
30

  
31

  
32
Indices and tables
33
******************
34

  
35
* *Index*
36

  
37
* *Search Page*
/dev/null
1

  
2
Readme / Documentation for *NucleoMiner2*
3
*****************************************
4

  
5
*NucleoMiner2* offers Python API and R package allowing to perform
6
quantitative analysis of nucleosomal epigenome. It is especially well
7
suited for scripting to extract natural Single-Nucleosome Epi-
8
Polymorphisms (SNEP) from ChIP-Seq data.
9

  
10

  
11
License
12
=======
13

  
14
Copyright CNRS 2012-2013
15

  
16
* Florent CHUFFART
17

  
18
* Jean-Baptiste VEYRIERAS
19

  
20
* Gael YVERT
21

  
22
This software is a computer program which purpose is to perform
23
quanti- tative analysis of epigenetic marks at single nucleosome
24
resolution.
25

  
26
This software is governed by the CeCILL license under French law and
27
abiding by the rules of distribution of free software.  You can  use,
28
modify and/ or redistribute the software under the terms of the CeCILL
29
license as circulated by CEA, CNRS and INRIA at the following URL
30
"http://www.cecill.info".
31

  
32
As a counterpart to the access to the source code and  rights to copy,
33
modify and redistribute granted by the license, users are provided
34
only with a limited warranty  and the software's author,  the holder
35
of the economic rights,  and the successive licensors  have only
36
limited liability.
37

  
38
In this respect, the user's attention is drawn to the risks associated
39
with loading,  using,  modifying and/or developing or reproducing the
40
software by the user in light of its specific status of free software,
41
that may mean  that it is complicated to manipulate,  and  that  also
42
therefore means  that it is reserved for developers  and  experienced
43
professionals having in-depth computer knowledge. Users are therefore
44
encouraged to load and test the software's suitability as regards
45
their requirements in conditions enabling the security of their
46
systems and/or data to be ensured and,  more generally, to use and
47
operate it in the same conditions as regards security.
48

  
49
The fact that you are presently reading this means that you have had
50
knowledge of the CeCILL license and that you accept its terms.
51

  
52

  
53
Installation Instructions
54
=========================
55

  
56

  
57
Links
58
-----
59

  
60
*NucleoMiner2* home page and documentation: https://forge.cbp.ens-
61
lyon.fr/redmine/projects/nucleominer
62

  
63
Gael Yvert lab page: http://www.ens-lyon.fr/LBMC/gisv/
64

  
65

  
66
Installation
67
------------
68

  
69
   * Download archive
70

  
71
   * Compile bowtie2
72

  
73
   * Compile samtools
74

  
75
   * Compile bedtools
76

  
77
   * Compile TemplateFilter
78

  
79
Required R packages:
80
   * bot
81

  
82
   * fork
83

  
84
   * rjson
85

  
86
   * seqinr
87

  
88
   * cachecache
89

  
90
   cd src/r_packages/
91
         tar xfvz R-latest.tar.gz
92
         cd R-patched
93
         ./configure --with-x=no PDFLATEX="ls"
94
         make
95
   cd ../../..
96
   R_BIN=src/r_packages/R-patched/bin/R
97
         $R_BIN CMD INSTALL src/r_packages/rjson_0.2.12.tar.gz
98
         $R_BIN CMD INSTALL src/r_packages/seqinr_3.0-7.tar.gz
99
         $R_BIN CMD INSTALL src/r_packages/plotrix_3.4-5.tar.gz
100
         $R_BIN CMD INSTALL src/r_packages/nm_2.0.tar.gz
101
         $R_BIN CMD INSTALL src/r_packages/fork_1.2.4.tar.gz
102
         $R_BIN CMD INSTALL src/r_packages/bot_0.9.tar.gz
103
         $R_BIN CMD INSTALL src/r_packages/DESeq_1.14.0.tar.gz
104

  
105
...
106

  
107

  
108
usage
109
=====
110

  
111
See html documentation for *NucleoMiner2*: http://www.ens-
112
lyon.fr/LBMC/gisv/
/dev/null
1

  
2
References
3
**********
4

  
5

  
6
Python Reference
7
================
8

  
9
configurator.CSV_SAMPLE_FILE = None
10

  
11
   Path to cvs file that contains sample information.
12

  
13
configurator.BOWTIE_BUILD_BIN = None
14

  
15
   Path for bowtie2 build bin.
16

  
17
configurator.BOWTIE2_BIN = None
18

  
19
   Path for bowtie2 bin.
20

  
21
configurator.SAMTOOLS_BIN = None
22

  
23
   Path for samtools bin.
24

  
25
configurator.BEDTOOLS_BIN = None
26

  
27
   Path for bedtools bin.
28

  
29
configurator.TF_BIN = None
30

  
31
   Path for TemplateFilter bin.
32

  
33
configurator.TF_TEMPLATES_FILE = None
34

  
35
   Path for TemplateFilter templates file.
36

  
37
configurator.ILLUMINA_OUTPUTFILE_PREFIX = None
38

  
39
   Prefix for Illumina fastq output files.
40

  
41
configurator.INDEX_DIR = None
42

  
43
   Path for index dir.
44

  
45
configurator.ALIGN_DIR = None
46

  
47
   Path for align dir.
48

  
49
configurator.LOG_DIR = None
50

  
51
   Path for log dir
52

  
53
configurator.CACHE_DIR = None
54

  
55
   Path for cache dir.
56

  
57
configurator.RESULTS_DIR = None
58

  
59
   Path for results dir
60

  
61
configurator.FASTA_REFERENCE_GENOME_FILES = None
62

  
63
   Dictionary where each fasta reference genomes is indexed by
64
   reference strain that it corresponds.
65

  
66
configurator.AREA_BLACK_LIST = None
67

  
68
   Dictionary where keys are strain and values are black listed of
69
   geneome region.
70

  
71
configurator.FASTA_INDEXES = None
72

  
73
   Dictionary of strain that indexes dictionaries where keys are
74
   chromosome reference from Fastq file and value are its
75
   correspondance for Templatefilter.
76

  
77
configurator.C2C_FILES = None
78

  
79
   Dictionary where each strain combination indexes genome aligment.
80

  
81
configurator.READ_LENGTH = None
82

  
83
   Length of Illumina reads.
84

  
85
configurator.MAPQ_THRES = None
86

  
87
   Aligment quality thresold.
88

  
89
configurator.TF_CORR = None
90

  
91
   TemplateFilter Template correlation threshold.
92

  
93
configurator.TF_MINW = None
94

  
95
   TemplateFilter minimum width of a nucleosome.
96

  
97
configurator.TF_MAXW = None
98

  
99
   TemplateFilter maximum  width of a nucleosome.
100

  
101
configurator.TF_OL = None
102

  
103
   TemplateFilter maximum allowed overlap for two nucleosomes.
104

  
105
wf.json_conf_file = 'src/current/nucleominer_config.json'
106

  
107
   Path to the json configuration file.
108

  
109
wf.samples = []
110

  
111
   List of samples where a sample is identify by an id (key: *id*) and
112
   a strain name (key *strain*).
113

  
114
wf.samples_mnase = []
115

  
116
   List of Mnase samples.
117

  
118
wf.strains = []
119

  
120
   List of reference strains.
121

  
122
libcoverage.create_bowtie_index(strain, strain_fasta_ref, index_dir, bowtie_build_bin)
123

  
124
   Creates bowtie index for a strain *strain*.
125

  
126
   Parameters:
127
      * **strain** -- the strain reference.
128

  
129
      * **strain_fasta_ref** -- fasta reference genome.
130

  
131
      * **index_dir** -- directories where to put bowtie index.
132

  
133
      * **bowtie_build_bin** -- bowtie2 build binary.
134

  
135
libcoverage.align_reads(sample, align_dir, log_dir, index_dir, illumina_outputfile_prefix, bowtie2_bin, samtools_bin, bedtools_bin)
136

  
137
   Aligns reads to reference genomes. It produces .sam files, that are
138
   converted to .bam, that are converted to .bed.
139

  
140
   Parameters:
141
      * **sample** -- a dict that describe a sample.
142

  
143
      * **align_dir** -- directory where aligned reads will be
144
        stored.
145

  
146
      * **log_dir** -- directory where logs will be stored.
147

  
148
      * **illumina_outputfile_prefix** -- prefix of Illumina
149
        sequencer fastq.gz output files.
150

  
151
      * **bowtie2_bin** -- bowtie2 binary.
152

  
153
      * **samtools_bin** -- samtools binary.
154

  
155
      * **bedtools_bin** -- bedtools binary.
156

  
157
      * **index_dir** -- bowtie index directory.
158

  
159
libcoverage.split_fr_4_TF(sample, align_dir, fasta_indexes, area_black_list, read_length, mapq_thres)
160

  
161
   Create TempleFilter input files form bed files. This function
162
   appends in two times. First, it collects reads from bed files and
163
   feeds a datastructure
164

  
165
   Parameters:
166
      * **sample** -- a dict that describe a sample.
167

  
168
      * **align_dir** -- directory where aligned reads will be
169
        stored.
170

  
171
      * **fasta_index** -- the chr reference from the illumina
172
        output file.
173

  
174
      * **area_black_list** -- the description of genome that will
175
        be omit.
176

  
177
      * **read_length** -- Length of Illumina reads.
178

  
179
      * **mapq_thres** -- mapping quality criterion threshold, see
180
        MAPQ in BED/BAM file format.
181

  
182
libcoverage.template_filter(sample, align_dir, log_dir, tf_bin, tf_templates_file, corr, minw, maxw, ol)
183

  
184
   Run TemplateFilter on a specifi sample. It produces .tab file.
185

  
186
   Parameters:
187
      * **sample** -- a dict that describe a sample.
188

  
189
      * **align_dir** -- directory where aligned reads will be
190
        stored.
191

  
192
      * **log_dir** -- directory where logs will be stored.
193

  
194
      * **tf_bin** -- path to the TemplateFilter binary.
195

  
196
      * **tf_templates_file** -- path to the TemplateFilter
197
        templates file.
198

  
199
      * **corr** -- correlation threshold transmits to
200
        TemplateFilter.
201

  
202
      * **minw** -- minimum width of a nuc, transmits to
203
        TemplateFilter.
204

  
205
      * **maxw** -- maximum width of a nuc, transmits to
206
        TemplateFilter.
207

  
208
      * **ol** -- maximum overlaps for 2 nuc, transmits to
209
        TemplateFilter.
210

  
211

  
212
R Reference
213
===========
214

  
215

  
216
Arabic to Roman pair list.
217
--------------------------
218

  
219

  
220
Description
221
~~~~~~~~~~~
222

  
223
Util to convert Arabicto Roman
224

  
225

  
226
Usage
227
~~~~~
228

  
229
   ARAB2ROM()
230

  
231

  
232
Author(s)
233
~~~~~~~~~
234

  
235
Florent Chuffart
236

  
237
R: False Discovery Rate
238

  
239

  
240
False Discovery Rate
241
--------------------
242

  
243

  
244
Description
245
~~~~~~~~~~~
246

  
247
From a vector x of independent p-values, extract the cutoff
248
corresponding to the specified FDR. See Benjamini & Hochberg 1995
249
paper
250

  
251

  
252
Usage
253
~~~~~
254

  
255
   FDR(x, FDR)
256

  
257

  
258
Arguments
259
~~~~~~~~~
260

  
261
"x"
262

  
263
A vector x of independent p-values.
264

  
265
"FDR"
266

  
267
The specified FDR.
268

  
269

  
270
Value
271
~~~~~
272

  
273
Return the the corresponding cutoff.
274

  
275

  
276
Author(s)
277
~~~~~~~~~
278

  
279
Gael Yvert, Florent Chuffart
280

  
281

  
282
Examples
283
~~~~~~~~
284

  
285
   print("example")
286

  
287
R: Roman to Arabic pair list.
288

  
289

  
290
Roman to Arabic pair list.
291
--------------------------
292

  
293

  
294
Description
295
~~~~~~~~~~~
296

  
297
Util to convert Roman to Arabic
298

  
299

  
300
Usage
301
~~~~~
302

  
303
   ROM2ARAB()
304

  
305

  
306
Author(s)
307
~~~~~~~~~
308

  
309
Florent Chuffart
310

  
311
R: Aggregate replicated sample's nucleosomes.
312

  
313

  
314
Aggregate replicated sample's nucleosomes.
315
------------------------------------------
316

  
317

  
318
Description
319
~~~~~~~~~~~
320

  
321
This function aggregates nucleosome for replicated samples. It uses
322
TemplateFilter ouput of each sample as replicate. Each sample owns a
323
set of nucleosomes computed using TemplateFilter and ordered by the
324
position of their center. Adajacent nucleosomes are compared two by
325
two. Comparison is based on a log likelihood ratio score. The issue of
326
comparison is adjacents nucleosomes merge or separation. Finally the
327
function returns a list of clusters and all computed *llr_scores*.
328
Each cluster ows an attribute *wp* for "well positionned". This
329
attribute is set as *TRUE* if the cluster is composed of exactly one
330
nucleosomes of each sample.
331

  
332

  
333
Usage
334
~~~~~
335

  
336
   aggregate_intra_strain_nucs(samples, llr_thres = 20, coord_max = 2e+07)
337

  
338

  
339
Arguments
340
~~~~~~~~~
341

  
342
"samples"
343

  
344
A list of samples. Each sample is a list like *sample = list(id=...,
345
marker=..., strain=..., roi=..., inputs=..., outputs=...)* with *roi =
346
list(name=..., begin=..., end=..., chr=..., genome=...)*.
347

  
348
"llr_thres"
349

  
350
Log likelihood ration threshold.
351

  
352
"coord_max"
353

  
354
A too big value to be a coord for a nucleosome lower bound.
355

  
356

  
357
Value
358
~~~~~
359

  
360
Returns a list of clusterized nucleosomes, and all computed llr
361
scores.
362

  
363

  
364
Author(s)
365
~~~~~~~~~
366

  
367
Florent Chuffart
368

  
369

  
370
Examples
371
~~~~~~~~
372

  
373
   # Dealing with a region of interest
374
   roi =list(name="example", begin=1000,  end=1300, chr="1", genome=rep("A",301))
375
   samples = list()
376
   for (i in 1:3) {
377
       # Create TF output
378
       tf_nuc = list("chr"=paste("chr", roi$chr, sep=""), "center"=(roi$end + roi$begin)/2, "width"= 150, "correlation.score"= 0.9)
379
       outputs = dfadd(NULL,tf_nuc)
380
       outputs = filter_tf_outputs(outputs, roi$chr, roi$begin, roi$end)
381
       # Generate corresponding reads
382
       nb_reads = round(runif(1,170,230))
383
       reads = round(rnorm(nb_reads, tf_nuc$center,20))
384
       u_reads = sort(unique(reads))
385
       strands = sample(c(rep("R",ceiling(length(u_reads)/2)),rep("F",floor(length(u_reads)/2))))
386
       counts = apply(t(u_reads), 2, function(r) { sum(reads == r)})
387
       shifts = apply(t(strands), 2, function(s) { if (s == "F") return(-tf_nuc$width/2) else return(tf_nuc$width/2)})
388
       u_reads = u_reads + shifts
389
       inputs = data.frame(list("V1" = rep(roi$chr, length(u_reads)),
390
                                "V2" = u_reads,
391
                                                        "V3" = strands,
392
                                                        "V4" = counts), stringsAsFactors=FALSE)
393
       samples[[length(samples) + 1]] = list(id=1, marker="Mnase_Seq", strain="strain_ex", total_reads = 10000000, roi=roi, inputs=inputs, outputs=outputs)
394
   }
395
   print(aggregate_intra_strain_nucs(samples))
396

  
397
R: Aligns nucleosomes between 2 strains.
398

  
399

  
400
Aligns nucleosomes between 2 strains.
401
-------------------------------------
402

  
403

  
404
Description
405
~~~~~~~~~~~
406

  
407
This function aligns nucs between two strains for a given genome
408
region.
409

  
410

  
411
Usage
412
~~~~~
413

  
414
   align_inter_strain_nucs(replicates, wp_nucs_strain_ref1 = NULL,
415
       wp_nucs_strain_ref2 = NULL, corr_thres = 0.5, llr_thres = 100,
416
       config = NULL, ...)
417

  
418

  
419
Arguments
420
~~~~~~~~~
421

  
422
"replicates"
423

  
424
Set of replicates, ideally 3 per strain.
425

  
426
"wp_nucs_strain_ref1"
427

  
428
List of aggregates nucleosome for strain 1. If it's null this list
429
will be computed.
430

  
431
"wp_nucs_strain_ref2"
432

  
433
List of aggregates nucleosome for strain 2. If it's null this list
434
will be computed.
435

  
436
"corr_thres"
437

  
438
Correlation threshold.
439

  
440
"llr_thres"
441

  
442
LOD cut off.
443

  
444
"config"
445

  
446
GLOBAL config variable
447

  
448
"..."
449

  
450
A list of parameters that will be passed to
451
*aggregate_intra_strain_nucs* if needed.
452

  
453

  
454
Value
455
~~~~~
456

  
457
Returns a list of clusterized nucleosomes, and all computed llr
458
scores.
459

  
460

  
461
Author(s)
462
~~~~~~~~~
463

  
464
Florent Chuffart
465

  
466

  
467
Examples
468
~~~~~~~~
469

  
470
       # Define new translate_cur function...
471
       translate_cur = function(roi, strain2, big_cur=NULL, config=NULL) {
472
         return(roi)
473
       }
474
       # Binding it by uncomment follwing lines.
475
       unlockBinding("translate_cur", as.environment("package:nucleominer"))
476
       unlockBinding("translate_cur", getNamespace("nucleominer"))
477
       assign("translate_cur", translate_cur, "package:nucleominer")
478
       assign("translate_cur", translate_cur, getNamespace("nucleominer"))
479
       lockBinding("translate_cur", getNamespace("nucleominer"))
480
       lockBinding("translate_cur", as.environment("package:nucleominer"))
481

  
482
   # Dealing with a region of interest
483
   roi =list(name="example", begin=1000,  end=1300, chr="1", genome=rep("A",301), strain_ref1 = "STRAINREF1")
484
   roi2 = translate_cur(roi, roi$strain_ref1)
485
   replicates = list()
486
   for (j in 1:2) {
487
       samples = list()
488
       for (i in 1:3) {
489
           # Create TF output
490
           tf_nuc = list("chr"=paste("chr", roi$chr, sep=""), "center"=(roi$end + roi$begin)/2, "width"= 150, "correlation.score"= 0.9)
491
           outputs = dfadd(NULL,tf_nuc)
492
           outputs = filter_tf_outputs(outputs, roi$chr, roi$begin, roi$end)
493
           # Generate corresponding reads
494
           nb_reads = round(runif(1,170,230))
495
           reads = round(rnorm(nb_reads, tf_nuc$center,20))
496
           u_reads = sort(unique(reads))
497
           strands = sample(c(rep("R",ceiling(length(u_reads)/2)),rep("F",floor(length(u_reads)/2))))
498
           counts = apply(t(u_reads), 2, function(r) { sum(reads == r)})
499
           shifts = apply(t(strands), 2, function(s) { if (s == "F") return(-tf_nuc$width/2) else return(tf_nuc$width/2)})
500
           u_reads = u_reads + shifts
501
           inputs = data.frame(list("V1" = rep(roi$chr, length(u_reads)),
502
                                    "V2" = u_reads,
503
                                                            "V3" = strands,
504
                                                            "V4" = counts), stringsAsFactors=FALSE)
505
           samples[[length(samples) + 1]] = list(id=1, marker="Mnase_Seq", strain=paste("strain_ex",j,sep=""), total_reads = 10000000, roi=roi, inputs=inputs, outputs=outputs)
506
       }
507
       replicates[[length(replicates) + 1]] = samples
508
   }
509
   print(align_inter_strain_nucs(replicates))
510

  
511
R: Launch deseq methods.
512

  
513

  
514
Launch deseq methods.
515
---------------------
516

  
517

  
518
Description
519
~~~~~~~~~~~
520

  
521
This function is based on deseq example. It mormalizes data, fit data
522
to GLM model with and without interaction term and compare the two
523
l;=models.
524

  
525

  
526
Usage
527
~~~~~
528

  
529
   analyse_design(snep_design, reads)
530

  
531

  
532
Arguments
533
~~~~~~~~~
534

  
535
"snep_design"
536

  
537
The design to considere.
538

  
539
"reads"
540

  
541
The data to considere.
542

  
543

  
544
Author(s)
545
~~~~~~~~~
546

  
547
Florent Chuffart
548

  
549
R: Stage replicates data
550

  
551

  
552
Stage replicates data
553
---------------------
554

  
555

  
556
Description
557
~~~~~~~~~~~
558

  
559
This function loads in memory data corresponding to the given
560
experiments.
561

  
562

  
563
Usage
564
~~~~~
565

  
566
   build_replicates(expe, roi, only_fetch = FALSE, get_genome = FALSE,
567
       all_samples, config = NULL)
568

  
569

  
570
Arguments
571
~~~~~~~~~
572

  
573
"expe"
574

  
575
a list of vector corresponding to vector of replicates.
576

  
577
"roi"
578

  
579
the region that we are interested in.
580

  
581
"only_fetch"
582

  
583
filter or not inputs.
584

  
585
"get_genome"
586

  
587
Load or not corresponding genome.
588

  
589
"all_samples"
590

  
591
Global list of samples.
592

  
593
"config"
594

  
595
GLOBAL config variable.
596

  
597

  
598
Author(s)
599
~~~~~~~~~
600

  
601
Florent Chuffart
602

  
603

  
604
Examples
605
~~~~~~~~
606

  
607
   # library(rjson)
608
   # library(nucleominer)
609
   #
610
   # # Read config file
611
   # json_conf_file = "nucleo_miner_config.json"
612
   # config = fromJSON(paste(readLines(json_conf_file), collapse=""))
613
   # # Read sample file
614
   # all_samples = get_content(config$CSV_SAMPLE_FILE, "cvs", sep=";", head=TRUE, stringsAsFactors=FALSE)
615
   # # here are the sample ids in a list
616
   # expes = list(c(1))
617
   # # here is the region that we wnt to see the coverage
618
   # cur = list(chr="8", begin=472000, end=474000, strain_ref="BY")
619
   # # it displays the corverage
620
   # replicates = build_replicates(expes, cur, all_samples=all_samples, config=config)
621
   # out = watch_samples(replicates, config$READ_LENGTH,
622
   #       plot_coverage = TRUE,
623
   #       plot_squared_reads = FALSE,
624
   #       plot_ref_genome = FALSE,
625
   #       plot_arrow_raw_reads = FALSE,
626
   #       plot_arrow_nuc_reads = FALSE,
627
   #       plot_gaussian_reads = FALSE,
628
   #       plot_gaussian_unified_reads = FALSE,
629
   #       plot_ellipse_nucs = FALSE,
630
   #       plot_wp_nucs = FALSE,
631
   #       plot_wp_nuc_model = FALSE,
632
   #       plot_common_nucs = FALSE,
633
   #       height = 50)
634

  
635
R: Extract a sub part of the corresponding c2c file
636

  
637

  
638
Extract a sub part of the corresponding c2c file
639
------------------------------------------------
640

  
641

  
642
Description
643
~~~~~~~~~~~
644

  
645
This fonction allow to acces to a specific part of the c2c file.
646

  
647

  
648
Usage
649
~~~~~
650

  
651
   c2c_extraction(strain1, strain2, chr = NULL, lower_bound = NULL,
652
       upper_bound = NULL, config = NULL)
653

  
654

  
655
Arguments
656
~~~~~~~~~
657

  
658
"strain1"
659

  
660
the key strain
661

  
662
"strain2"
663

  
664
the target strain
665

  
666
"chr"
667

  
668
if defined, the c2c will filtered according to the chromosome value
669

  
670
"lower_bound"
671

  
672
if defined, the c2c will filtered for part of the genome upper than
673
lower_bound
674

  
675
"upper_bound"
676

  
677
if defined, the c2c will filtered for part of the genome lower than
678
upper_bound
679

  
680
"config"
681

  
682
GLOBAL config variable
683

  
684

  
685
Author(s)
686
~~~~~~~~~
687

  
688
Florent Chuffart
689

  
690
R: reformat an "apply manipulated" list of regions
691

  
692

  
693
reformat an "apply manipulated" list of regions
694
-----------------------------------------------
695

  
696

  
697
Description
698
~~~~~~~~~~~
699

  
700
Utils to reformat an "apply manipulated" list of regions
701

  
702

  
703
Usage
704
~~~~~
705

  
706
   collapse_regions(regions)
707

  
708

  
709
Arguments
710
~~~~~~~~~
711

  
712
+-----------------+------+
713
+-----------------+------+
714

  
715

  
716
Author(s)
717
~~~~~~~~~
718

  
719
Florent Chuffart
720

  
721
R: Compute Common Uninterrupted Regions (CUR)
722

  
723

  
724
Compute Common Uninterrupted Regions (CUR)
725
------------------------------------------
726

  
727

  
728
Description
729
~~~~~~~~~~~
730

  
731
CURs are regions that can be aligned between the genomes
732

  
733

  
734
Usage
735
~~~~~
736

  
737
   compute_inter_all_strain_curs(diff_allowed = 30, min_cur_width = 4000,
738
       config = NULL)
739

  
740

  
741
Arguments
742
~~~~~~~~~
743

  
744
"diff_allowed"
745

  
746
the maximum indel width allowe din a CUR
747

  
748
"min_cur_width"
749

  
750
The minimum width of a CUR
751

  
752
"config"
753

  
754
GLOBAL config variable
755

  
756

  
757
Author(s)
758
~~~~~~~~~
759

  
760
Florent Chuffart
761

  
762
R: Crop bound of regions according to region of interest bound
763

  
764

  
765
Crop bound of regions according to region of interest bound
766
-----------------------------------------------------------
767

  
768

  
769
Description
770
~~~~~~~~~~~
771

  
772
The fucntion is no more necessary since we remove "big_cur" bug in
773
translate_cur function.
774

  
775

  
776
Usage
777
~~~~~
778

  
779
   crop_fuzzy(tmp_fuzzy_nucs, roi, strain, config = NULL)
780

  
781

  
782
Arguments
783
~~~~~~~~~
784

  
785
"tmp_fuzzy_nucs"
786

  
787
the regiuons to be croped.
788

  
789
"roi"
790

  
791
The region of interest.
792

  
793
"strain"
794

  
795
The strain to consider.
796

  
797
"config"
798

  
799
GLOBAL config variable
800

  
801

  
802
Author(s)
803
~~~~~~~~~
804

  
805
Florent Chuffart
806

  
807
R: Adding list to a dataframe.
808

  
809

  
810
Adding list to a dataframe.
811
---------------------------
812

  
813

  
814
Description
815
~~~~~~~~~~~
816

  
817
Add a list *l* to a dataframe *df*. Create it if *df* is *NULL*.
818
Return the dataframe *df*.
819

  
820

  
821
Usage
822
~~~~~
823

  
824
   dfadd(df, l)
825

  
826

  
827
Arguments
828
~~~~~~~~~
829

  
830
"df"
831

  
832
A dataframe
833

  
834
"l"
835

  
836
A list
837

  
838

  
839
Value
840
~~~~~
841

  
842
Return the dataframe *df*.
843

  
844

  
845
Author(s)
846
~~~~~~~~~
847

  
848
Florent Chuffart
849

  
850

  
851
Examples
852
~~~~~~~~
853

  
854
   ## Here dataframe is NULL
855
   print(df)
856
   df = NULL
857

  
858
   # Initialize df
859
   df = dfadd(df, list(key1 = "value1", key2 = "value2"))
860
   print(df)
861

  
862
   # Adding elements to df
863
   df = dfadd(df, list(key1 = "value1'", key2 = "value2'"))
864
   print(df)
865

  
866
R: Prefetch data
867

  
868

  
869
Prefetch data
870
-------------
871

  
872

  
873
Description
874
~~~~~~~~~~~
875

  
876
Fetch and filter inputs and outpouts per region of interest. Organize
877
it per replicates.
878

  
879

  
880
Usage
881
~~~~~
882

  
883
   fetch_mnase_replicates(strain, roi, all_samples, config = NULL,
884
       only_fetch = FALSE, get_genome = FALSE, get_ouputs = TRUE)
885

  
886

  
887
Arguments
888
~~~~~~~~~
889

  
890
"strain"
891

  
892
The strain we want mnase replicatesList of replicates. Each replicates
893
is a vector of sample ids.
894

  
895
"roi"
896

  
897
Region of interest.
898

  
899
"all_samples"
900

  
901
Global list of samples.
902

  
903
"config"
904

  
905
GLOBAL config variable
906

  
907
"only_fetch"
908

  
909
If TRUE, only fetch and not filtering. It is used tio load sample
910
files into memory before forking.
911

  
912
"get_genome"
913

  
914
If TRUE, load corresponding genome sequence.
915

  
916
"get_ouputs"
917

  
918
If TRUE, get also ouput corresponding TF output files.
919

  
920

  
921
Author(s)
922
~~~~~~~~~
923

  
924
Florent Chuffart
925

  
926
R: Filter TemplateFilter inputs
927

  
928

  
929
Filter TemplateFilter inputs
930
----------------------------
931

  
932

  
933
Description
934
~~~~~~~~~~~
935

  
936
This function filters TemplateFilter inputs according genome area
937
observed properties. It takes into account reads that are at the
938
frontier of this area and the strand of these reads.
939

  
940

  
941
Usage
942
~~~~~
943

  
944
   filter_tf_inputs(inputs, chr, x_min, x_max, nuc_width = 160,
945
       only_f = FALSE, only_r = FALSE, filter_for_coverage = FALSE)
946

  
947

  
948
Arguments
949
~~~~~~~~~
950

  
951
"inputs"
952

  
953
TF inputs to be filtered.
954

  
955
"chr"
956

  
957
Chromosome observed, here chr is an integer.
958

  
959
"x_min"
960

  
961
Coordinate of the first bp observed.
962

  
963
"x_max"
964

  
965
Coordinate of the last bp observed.
966

  
967
"nuc_width"
968

  
969
Nucleosome width.
970

  
971
"only_f"
972

  
973
Filter only F reads.
974

  
975
"only_r"
976

  
977
Filter only R reads.
978

  
979
"filter_for_coverage"
980

  
981
Does it filter for plot coverage?
982

  
983

  
984
Value
985
~~~~~
986

  
987
Returns filtred inputs.
988

  
989

  
990
Author(s)
991
~~~~~~~~~
992

  
993
Florent Chuffart
994

  
995
R: Filter TemplateFilter outputs
996

  
997

  
998
Filter TemplateFilter outputs
999
-----------------------------
1000

  
1001

  
1002
Description
1003
~~~~~~~~~~~
1004

  
1005
This function filters TemplateFilter outputs according, not only
1006
genome area observerved properties, but also correlation and
1007
overlapping threshold.
1008

  
1009

  
1010
Usage
1011
~~~~~
1012

  
1013
   filter_tf_outputs(tf_outputs, chr, x_min, x_max, nuc_width = 160,
1014
       ol_bp = 59, corr_thres = 0.5)
1015

  
1016

  
1017
Arguments
1018
~~~~~~~~~
1019

  
1020
"tf_outputs"
1021

  
1022
TemplateFilter outputs.
1023

  
1024
"chr"
1025

  
1026
Chromosome observed, here chr is an integer.
1027

  
1028
"x_min"
1029

  
1030
Coordinate of the first bp observed.
1031

  
1032
"x_max"
1033

  
1034
Coordinate of the last bp observed.
1035

  
1036
"nuc_width"
1037

  
1038
Nucleosome width.
1039

  
1040
"ol_bp"
1041

  
1042
Overlap Threshold.
1043

  
1044
"corr_thres"
1045

  
1046
Correlation threshold.
1047

  
1048

  
1049
Value
1050
~~~~~
1051

  
1052
Returns filtered TemplateFilter Outputs
1053

  
1054

  
1055
Author(s)
1056
~~~~~~~~~
1057

  
1058
Florent Chuffart
1059

  
1060
R: to flat aggregate_intra_strain_nucs function output
1061

  
1062

  
1063
to flat aggregate_intra_strain_nucs function output
1064
---------------------------------------------------
1065

  
1066

  
1067
Description
1068
~~~~~~~~~~~
1069

  
1070
This function builds a dataframe of all clusters obtain from
1071
aggregate_intra_strain_nucs function.
1072

  
1073

  
1074
Usage
1075
~~~~~
1076

  
1077
   flat_aggregated_intra_strain_nucs(partial_strain_maps, cur_index)
1078

  
1079

  
1080
Arguments
1081
~~~~~~~~~
1082

  
1083
"partial_strain_maps"
1084

  
1085
the output of aggregate_intra_strain_nucs function
1086

  
1087
"cur_index"
1088

  
1089
the index of the roi involved
1090

  
1091

  
1092
Value
1093
~~~~~
1094

  
1095
Returns a dataframe of all clusters obtain from
1096
aggregate_intra_strain_nucs function.
1097

  
1098

  
1099
Author(s)
1100
~~~~~~~~~
1101

  
1102
Florent Chuffart
1103

  
1104
R: flat reads
1105

  
1106

  
1107
flat reads
1108
----------
1109

  
1110

  
1111
Description
1112
~~~~~~~~~~~
1113

  
1114
Extract reads coordinates from TempleteFilter input sequence
1115

  
1116

  
1117
Usage
1118
~~~~~
1119

  
1120
   flat_reads(reads, nuc_width)
1121

  
1122

  
1123
Arguments
1124
~~~~~~~~~
1125

  
1126
"reads"
1127

  
1128
TemplateFilter input reads
1129

  
1130
"nuc_width"
1131

  
1132
Width used to shift F and R reads.
1133

  
1134

  
1135
Value
1136
~~~~~
1137

  
1138
Returns a list of F reads, R reads and joint/shifted F and R reads.
1139

  
1140

  
1141
Author(s)
1142
~~~~~~~~~
1143

  
1144
Florent Chuffart
1145

  
1146
R: Retrieve Reads
1147

  
1148

  
1149
Retrieve Reads
1150
--------------
1151

  
1152

  
1153
Description
1154
~~~~~~~~~~~
1155

  
1156
Retrieve reads for a given marker, combi, form.
1157

  
1158

  
1159
Usage
1160
~~~~~
1161

  
1162
   get_all_reads(marker, combi, form = "wp", config = NULL)
1163

  
1164

  
1165
Arguments
1166
~~~~~~~~~
1167

  
1168
"marker"
1169

  
1170
The marker to considere.
1171

  
1172
"combi"
1173

  
... Ce différentiel a été tronqué car il excède la taille maximale pouvant être affichée.

Formats disponibles : Unified diff