/doc/sphinx_doc/build/text/tuto.txt - Diff - NucleoMiner - Forge du Centre Blaise Pascal

Révision 21b8928f doc/sphinx_doc/build/text/tuto.txt

     the 53 samples is indentify by a uniq identifier. The file
     *CSV_SAMPLE_FILE* sums up this information.
     configurator.CSV_SAMPLE_FILE = None
        Path to cvs file that contains sample information.
     We use a convention to link sample and Illumina fastq outputs.
     Illumina output files of the sample *ID* will be stored in the
     directory *ILLUMINA_OUTPUTFILE_PREFIX* + *ID*. For example, sample 41
     outputs will be stored in the directory
     *data/2012-09-05/FASTQ/Sample_Yvert_Bq41/*.
     configurator.ILLUMINA_OUTPUTFILE_PREFIX = None
        Prefix for Illumina fastq output files.
     For BY (resp. RM and YJM) we use following reference genome
     *saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta* (resp.
     *saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta* and
     *saccharomyces_cerevisiae_YJM_789_screencontig.fasta*). The index
     *FASTA_REFERENCE_GENOME_FILES* stores this information.
     configurator.FASTA_REFERENCE_GENOME_FILES = None
        Dictionary where each fasta reference genomes is indexed by
        reference strain that it corresponds.
     Each chromosome/contig is identify in the fasta file by an obscure
     identifier. For example, BY chromosome I is identify by
     *gi|144228165|ref|NC_001133.7|* when TemplateFilter is waiting for an
     integer. So, we translate it. The index *FASTA_INDEXES* stores this
     translation.
     configurator.FASTA_INDEXES = None
        Dictionary of strain that indexes dictionaries where keys are
        chromosome reference from Fastq file and value are its
        correspondance for Templatefilter.
     From a pragamatical point of view we discard some part of the genome
     (repeated sequence etc...). The list of the black listed area is
     explicitely detailled in *AREA_BLACK_LIST*.
     configurator.AREA_BLACK_LIST = None
        Dictionary where keys are strain and values are black listed of
        geneome region.
     For BY-RM (resp. BY-YJM and RM-YJM) genome sequence alignment we use
     previously compute .c2c file
     *data/2012-03_primarydata/BY_RM_gxcomp.c2c* (resp.
-...
     *NucleoMiner*, the old version of *NucleoMiner2* (http://www.ens-
     lyon.fr/LBMC/gisv/NucleoMiner_Manual/manual.pdf).
     configurator.C2C_FILES = None
        Dictionary where each strain combination indexes genome aligment.
     *nucleominer* uses specific directory to work in, these are described
     in *INDEX_DIR*, *ALIGN_DIR* and *LOG_DIR*.
-...
     All paths, prefixes and indexes could be change in the
     *src/current/nucleominer_config.json* file.
     wf.json_conf_file = 'src/nucleo_miner/nucleo_miner_config.json'
        Path to the json configuration file.
     Preprocessing Illumina Fastq Reads for Each Sample
     ==================================================
-...
     *samples* *samples_mnase* and *strains* that will be used along the 4
     steps.
     wf.samples = []
        List of samples where a sample is identify by an id (key: *id*) and
        a strain name (key *strain*).
     wf.samples_mnase = []
        List of Mnase samples.
     wf.strains = []
        List of reference strains.
     Creating Bowtie Index from each Reference Genome
     ------------------------------------------------
-...
     will be used by bowtie to align reads. This step is performed by the
     following part of the *wf.py* script:
          for strain in strains:
            per_strain_stats[strain] = create_bowtie_index(strain,
              config["FASTA_REFERENCE_GENOME_FILES"][strain], config["INDEX_DIR"],
              config["BOWTIE_BUILD_BIN"])
     The following table sum up involved file sizes and process durations
     concerning this step.
-...
     *subprocess* class. This step is performed by the followinw part of
     the *wf.py* script:
          for sample in samples:
            per_sample_align_stats["sample_%s" % sample["id"]] = align_reads(sample,
              config["ALIGN_DIR"], config["LOG_DIR"], config["INDEX_DIR"],
              config["ILLUMINA_OUTPUTFILE_PREFIX"], config["BOWTIE2_BIN"],
              config["SAMTOOLS_BIN"], config["BEDTOOLS_BIN"])
     Convert Aligned Reads for TemplateFilter
     ----------------------------------------
-...
     This step is performed by the followinw part of the *wf.py* script:
          for sample in samples:
            per_sample_convert_stats["sample_%s" % sample["id"]] = split_fr_4_TF(sample,
              config["ALIGN_DIR"], config["FASTA_INDEXES"], config["AREA_BLACK_LIST"],
              config["READ_LENGTH"],config["MAPQ_THRES"])
     The following table sum up number of reads, involved file sizes and
     process durations concerning the two last steps. In our case, aligment
     process have been multuthreaded over over 3 cores.
-...
     This step is performed by the followinw part of the *wf.py* script:
          for sample in samples_mnase:
            per_mnase_sample_stats["sample_%s" % sample["id"]] = template_filter(sample,
              config["ALIGN_DIR"], config["LOG_DIR"], config["TF_BIN"],
              config["TF_TEMPLATES_FILE"], config["TF_CORR"], config["TF_MINW"],
              config["TF_MAXW"], config["TF_OL"])
     +----+--------+------------+---------------+------------------+
     | id | strain | found nucs | nuc file size | process duration |
     +====+========+============+===============+==================+

Formats disponibles : Unified diff

LBMC » NucleoMiner

Révision 21b8928f doc/sphinx_doc/build/text/tuto.txt