Révision 21b8928f doc/sphinx_doc/build/text/tuto.txt

b/doc/sphinx_doc/build/text/tuto.txt
44 44
the 53 samples is indentify by a uniq identifier. The file
45 45
*CSV_SAMPLE_FILE* sums up this information.
46 46

  
47
configurator.CSV_SAMPLE_FILE = None
48

  
49
   Path to cvs file that contains sample information.
50

  
51 47
We use a convention to link sample and Illumina fastq outputs.
52 48
Illumina output files of the sample *ID* will be stored in the
53 49
directory *ILLUMINA_OUTPUTFILE_PREFIX* + *ID*. For example, sample 41
54 50
outputs will be stored in the directory
55 51
*data/2012-09-05/FASTQ/Sample_Yvert_Bq41/*.
56 52

  
57
configurator.ILLUMINA_OUTPUTFILE_PREFIX = None
58

  
59
   Prefix for Illumina fastq output files.
60

  
61 53
For BY (resp. RM and YJM) we use following reference genome
62 54
*saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta* (resp.
63 55
*saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta* and
64 56
*saccharomyces_cerevisiae_YJM_789_screencontig.fasta*). The index
65 57
*FASTA_REFERENCE_GENOME_FILES* stores this information.
66 58

  
67
configurator.FASTA_REFERENCE_GENOME_FILES = None
68

  
69
   Dictionary where each fasta reference genomes is indexed by
70
   reference strain that it corresponds.
71

  
72 59
Each chromosome/contig is identify in the fasta file by an obscure
73 60
identifier. For example, BY chromosome I is identify by
74 61
*gi|144228165|ref|NC_001133.7|* when TemplateFilter is waiting for an
75 62
integer. So, we translate it. The index *FASTA_INDEXES* stores this
76 63
translation.
77 64

  
78
configurator.FASTA_INDEXES = None
79

  
80
   Dictionary of strain that indexes dictionaries where keys are
81
   chromosome reference from Fastq file and value are its
82
   correspondance for Templatefilter.
83

  
84 65
From a pragamatical point of view we discard some part of the genome
85 66
(repeated sequence etc...). The list of the black listed area is
86 67
explicitely detailled in *AREA_BLACK_LIST*.
87 68

  
88
configurator.AREA_BLACK_LIST = None
89

  
90
   Dictionary where keys are strain and values are black listed of
91
   geneome region.
92

  
93 69
For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use
94 70
previously compute .c2c file
95 71
*data/2012-03_primarydata/BY_RM_gxcomp.c2c* (resp.
......
98 74
*NucleoMiner*, the old version of *NucleoMiner2* (http://www.ens-
99 75
lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf).
100 76

  
101
configurator.C2C_FILES = None
102

  
103
   Dictionary where each strain combination indexes genome aligment.
104

  
105 77
*nucleominer* uses specific directory to work in, these are described
106 78
in *INDEX_DIR*, *ALIGN_DIR* and *LOG_DIR*.
107 79

  
......
112 84
All paths, prefixes and indexes could be change in the
113 85
*src/current/nucleominer_config.json* file.
114 86

  
115
wf.json_conf_file = 'src/nucleo_miner/nucleo_miner_config.json'
116

  
117
   Path to the json configuration file.
118

  
119 87

  
120 88
Preprocessing Illumina Fastq Reads for Each Sample
121 89
==================================================
......
125 93
*samples* *samples_mnase* and *strains* that will be used along the 4
126 94
steps.
127 95

  
128
wf.samples = []
129

  
130
   List of samples where a sample is identify by an id (key: *id*) and
131
   a strain name (key *strain*).
132

  
133
wf.samples_mnase = []
134

  
135
   List of Mnase samples.
136

  
137
wf.strains = []
138

  
139
   List of reference strains.
140

  
141 96

  
142 97
Creating Bowtie Index from each Reference Genome
143 98
------------------------------------------------
......
147 102
will be used by bowtie to align reads. This step is performed by the
148 103
following part of the *wf.py* script:
149 104

  
150
     for strain in strains:
151
       per_strain_stats[strain] = create_bowtie_index(strain, 
152
         config["FASTA_REFERENCE_GENOME_FILES"][strain], config["INDEX_DIR"], 
153
         config["BOWTIE_BUILD_BIN"])
154

  
155 105
The following table sum up involved file sizes and process durations
156 106
concerning this step.
157 107

  
......
175 125
*subprocess* class. This step is performed by the followinw part of
176 126
the *wf.py* script:
177 127

  
178
     for sample in samples:
179
       per_sample_align_stats["sample_%s" % sample["id"]] = align_reads(sample, 
180
         config["ALIGN_DIR"], config["LOG_DIR"], config["INDEX_DIR"], 
181
         config["ILLUMINA_OUTPUTFILE_PREFIX"], config["BOWTIE2_BIN"], 
182
         config["SAMTOOLS_BIN"], config["BEDTOOLS_BIN"])
183

  
184 128

  
185 129
Convert Aligned Reads for TemplateFilter
186 130
----------------------------------------
......
206 150

  
207 151
This step is performed by the followinw part of the *wf.py* script:
208 152

  
209
     for sample in samples:
210
       per_sample_convert_stats["sample_%s" % sample["id"]] = split_fr_4_TF(sample, 
211
         config["ALIGN_DIR"], config["FASTA_INDEXES"], config["AREA_BLACK_LIST"], 
212
         config["READ_LENGTH"],config["MAPQ_THRES"])
213

  
214 153
The following table sum up number of reads, involved file sizes and
215 154
process durations concerning the two last steps. In our case, aligment
216 155
process have been multuthreaded over over 3 cores.
......
346 285

  
347 286
This step is performed by the followinw part of the *wf.py* script:
348 287

  
349
     for sample in samples_mnase:
350
       per_mnase_sample_stats["sample_%s" % sample["id"]] = template_filter(sample, 
351
         config["ALIGN_DIR"], config["LOG_DIR"], config["TF_BIN"], 
352
         config["TF_TEMPLATES_FILE"], config["TF_CORR"], config["TF_MINW"], 
353
         config["TF_MAXW"], config["TF_OL"])  
354

  
355 288
+----+--------+------------+---------------+------------------+
356 289
| id | strain | found nucs | nuc file size | process duration |
357 290
+====+========+============+===============+==================+

Formats disponibles : Unified diff