/ - Diff - NucleoMiner - Forge du Centre Blaise Pascal

Révision 3c88abd0

     Welcome to *NucleoMiner2*
     *************************
     * Readme / Documentation for *NucleoMiner2*
       * License
       * Installation Instructions
     * Tutorial
       * Experimental Dataset, Working Directory and Configuration File
       * Preprocessing Illumina Fastq Reads for Each Sample
       * Inferring Nucleosome Position and Extracting Read Counts
       * Results: Number of SNEPs
       * APPENDICE: Generate .c2c Files
     * References
       * Python Reference
       * R Reference
     Readme / Documentation for *NucleoMiner2*
     *****************************************
-...
        R CMD INSTALL doc/Chuffart_NM2_workdir/deps/bot_0.14.tar.gz\
            doc/Chuffart_NM2_workdir/deps/cachecache_0.1.tar.gz\
            build/nucleominer_2.3.46.tar.gz
     Tutorial
     ********
     This tutorial describes steps allowing performing quantitative
     analysis of epigenetic marks on individual nucleosomes. We assume that
     files are organised according to a given hierarchy and that all
     command lines are launched from the project’s root directory.
     This tutorial is divided into two main parts. The first part covers
     the python script *wf.py* that aligns and converts short sequence
     reads. The second part covers the R scripts that extracts information
     (nucleosome position and indicators) from the dataset.
     Experimental Dataset, Working Directory and Configuration File
     ==============================================================
     Working Directory Organisation
     ------------------------------
     After having install NucleoMiner2 environment (Previous section), go
     to the root working directory of the tutorial by typing the following
     command in a terminal:
        cd doc/Chuffart_NM2_workdir/
     Retrieving Experimental Dataset
     -------------------------------
     The MNase-seq and MN-ChIP-seq raw data are available at ArrayExpress
     (http://www.ebi.ac.uk/arrayexpress/) under accession number
     E-MTAB-2671.
     $$$ TODO explain how organise Experimental Dataset into the *data*
     directory of the working directory.
     We want to compare nucleosomes of 2 yeast strains: BY and RM. For each
     strain we performed Mnase-Seq and ChIP-Seq using an antibody
     recognizing the H3K14ac epigenetic mark.
     The dataset is composed of 55 files organised as follows:
        * 3 replicates for BY MNase Seq
          * sample 1 (5 fastq.gz files)
          * sample 2 (5 fastq.gz files)
          * sample 3 (4 fastq.gz files)
        * 3 replicates for RM MNase Seq
          * sample 4 (4 fastq.gz files)
          * sample 5 (4 fastq.gz files)
          * sample 6 (5 fastq.gz files)
        * 3 replicates for BY ChIP Seq H3K14ac
          * sample 36 (5 fastq.gz files)
          * sample 37 (5 fastq.gz files)
          * sample 53 (9 fastq.gz files)
        * 2 replicates for RM ChIP Seq H3K14ac
          * sample 38 (5 fastq.gz files)
          * sample 39 (4 fastq.gz files)
     Python and R Common Configuration File
     --------------------------------------
     First of all we define in one place some configuration variables that
     will be launched by python and R scripts. These variables are
     contained in file *configurator.py*. The execution of this python
     script dumps variables into the *nucleominer_config.json* file that
     will then be used by both R and python scripts.
     To do this, go to the root directory of your project and run the
     following command:
        python src/current/configurator.py
     Preprocessing Illumina Fastq Reads for Each Sample
     ==================================================
     This preprocessing step consists of 4 main steps embedded in the
     *wf.py* script. They are described bellow. As a preamble, this script
     computes *samples*, *samples_mnase* and *strains* that will be used
     along the 4 steps.
     wf.samples = []
        List of samples where a sample is identified by an id (key: *id*)
        and a strain name (key *strain*).
     wf.samples_mnase = []
        List of Mnase samples.
     wf.strains = []
        List of reference strains.
     Creating Bowtie Index from each Reference Genome
     ------------------------------------------------
     For each strain, we need to create bowtie index. Bowtie index of a
     strain is a tree view of the genome of this strain. It will be used by
     bowtie to align reads. This step is performed by the following part of
     the *wf.py* script:
          for strain in strains:
            per_strain_stats[strain] = create_bowtie_index(strain,
              config["FASTA_REFERENCE_GENOME_FILES"][strain], config["INDEX_DIR"],
              config["BOWTIE_BUILD_BIN"])
     The following table summarizes the file sizes and process durations
     concerning this step.
     +--------+------------------------+------------------------+------------------+
     | strain | fasta genome file size | bowtie index file size | process duration |
     +========+========================+========================+==================+
     | BY     | 12 Mo                  | 25 Mo                  | 11 s.            |
     +--------+------------------------+------------------------+------------------+
     | RM     | 12 Mo                  | 24 Mo                  | 9 s.             |
     +--------+------------------------+------------------------+------------------+
     Aligning Reads to Reference Genome
     ----------------------------------
     Next, we launch bowtie to align reads to the reference genome. It
     produces a *.sam* file that we convert into a *.bed* file. Binaries
     for *bowtie*, *samtools* and *bedtools* are wrapped using python
     *subprocess* class. This step is performed by the following part of
     the *wf.py* script:
          for sample in samples:
            per_sample_align_stats["sample_%s" % sample["id"]] = align_reads(sample,
              config["ALIGN_DIR"], config["LOG_DIR"], config["INDEX_DIR"],
              config["ILLUMINA_OUTPUTFILE_PREFIX"], config["BOWTIE2_BIN"],
              config["SAMTOOLS_BIN"], config["BEDTOOLS_BIN"])
     Convert Aligned Reads into TemplateFilter Format
     ------------------------------------------------
     TemplateFilter uses particular input formats for reads, so it is
     necessary to convert the *.bed* files. TemplateFilter expect reads as
     follows: *chr*, *coord*, *strand* and *#read* where:
     * *chr* is the number of the chromosome;
     * *coord* is the coordinate of the reads;
     * *strand* is *F* for forward and *R* for reverse;
     * *#reads* the number of reads covering this position.
     Each entry is *tab*-separated.
     **WARNING** for reverse strands, bowtie returns the position of the
     first nucleotide on the left hand side, whereas TemplateFilter expects
     the first one on the right hand side.  This step takes this into
     account by adding the read length (in our case 50) to the reverse
     reads coordinates.
     This step is performed by the following part of the *wf.py* script:
          for sample in samples:
            per_sample_convert_stats["sample_%s" % sample["id"]] = split_fr_4_TF(sample,
              config["ALIGN_DIR"], config["FASTA_INDEXES"], config["AREA_BLACK_LIST"],
              config["READ_LENGTH"],config["MAPQ_THRES"])
     The following table summarises the number of reads, the involved file
     sizes and process durations concerning the two last steps. In our
     case, alignment process have been multithreaded over 3 cores.
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | id | Illumina reads | aligned and filtred reads | ratio  | *.bed* file size | TF input file size | process duration |
     +====+================+===========================+========+==================+====================+==================+
     | 1  | 16436138       | 10199695                  | 62,06% | 1064 Mo          | 60  Mo             | 383   s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | 2  | 16911132       | 12512727                  | 73,99% | 1298 Mo          | 64  Mo             | 437   s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | 3  | 15946902       | 12340426                  | 77,38% | 1280 Mo          | 65  Mo             | 423   s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | 4  | 13765584       | 10381903                  | 75,42% | 931  Mo          | 59  Mo             | 352   s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | 5  | 15168268       | 11502855                  | 75,83% | 1031 Mo          | 64  Mo             | 386   s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | 6  | 18850820       | 14024905                  | 74,40% | 1254 Mo          | 69  Mo             | 482   s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | 36 | 17715118       | 14092985                  | 79,55% | 1404 Mo          | 68  Mo             | 483   s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | 37 | 17288466       | 7402082                   | 42,82% | 741  Mo          | 48  Mo             | 339   s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | 38 | 16116394       | 13178457                  | 81,77% | 1101 Mo          | 63  Mo             | 420   s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | 39 | 14241106       | 10537228                  | 73,99% | 880  Mo          | 57  Mo             | 348   s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     | 53 | 40876476       | 33780065                  | 82,64% | 3316 Mo          | 103 Mo             | 1165  s.         |
     +----+----------------+---------------------------+--------+------------------+--------------------+------------------+
     Run TemplateFilter on Mnase Samples
     -----------------------------------
     Finally, for each sample we perform TemplateFilter analysis.
     **WARNING** TemplateFilter returns a list of nucleosomes. Each
     nucleosome is define by its center and its width. An odd width leads
     us to consider non- integer lower and upper bound.
     **WARNING** TemplateFilter is not designed to deal with replicates. So
     we recommend to keep a maximum of nucleosomes and filter the aberrant
     ones afterwards using the benefits of having replicates. To do this,
     we set a low correlation threshold parameter (0.5) and a particularly
     high value of overlap (300%).
     This step is performed by the following part of the *wf.py* script:
          for sample in samples_mnase:
            per_mnase_sample_stats["sample_%s" % sample["id"]] = template_filter(sample,
              config["ALIGN_DIR"], config["LOG_DIR"], config["TF_BIN"],
              config["TF_TEMPLATES_FILE"], config["TF_CORR"], config["TF_MINW"],
              config["TF_MAXW"], config["TF_OL"])
     +----+--------+------------+---------------+------------------+
     | id | strain | found nucs | nuc file size | process duration |
     +====+========+============+===============+==================+
     | 1  | BY     | 96214      | 68 Mo         | 1022 s.          |
     +----+--------+------------+---------------+------------------+
     | 2  | BY     | 91694      | 65 Mo         | 1038 s.          |
     +----+--------+------------+---------------+------------------+
     | 3  | BY     | 91205      | 65 Mo         | 1036 s.          |
     +----+--------+------------+---------------+------------------+
     | 4  | RM     | 88076      | 62 Mo         | 984 s.           |
     +----+--------+------------+---------------+------------------+
     | 5  | RM     | 90141      | 64 Mo         | 967 s.           |
     +----+--------+------------+---------------+------------------+
     | 6  | RM     | 87517      | 62 Mo         | 980 s.           |
     +----+--------+------------+---------------+------------------+
     Inferring Nucleosome Position and Extracting Read Counts
     ========================================================
     The second part of the tutorial uses R
     (http://http://www.r-project.org). It consists of a set of R scripts
     that will be sourced in an R from a console launched at the root of
     your project. These scripts are:
        * headers.R
        * extract_maps.R
        * translate_common_wp.R
        * split_samples.R
        * count_reads.R
        * get_size_factors
        * launch_deseq.R
     The Script headers.R
     --------------------
     The script headers.R is included in each other scripts. It is in
     charge of:
        * launching libraries used in the scripts
        * launching configuration (design, strain, marker...)
        * computing and caching CURs (caching means storing the
          information in the computer's memory)
     Note that you can customize the function “translate”. This function
     allows you to use the alignments between genomes when performing
     various tasks. You may be using NucleoMiner2 to analyse data of a
     single strain, or of several strains.
        * All the data corresponds to the same strain (e.g.
          treatment/control, or only few mutations): Then in step 1), the
          regions to use are entire chromosomes. Instep 2) simply use the
          default translate function which is neutral.
        * The data come from two or more strains: In this case, edit a
          list of regions and customize the translate function which
          performs the correspondence between the different genomes. How we
          did it: a .c2c file is obtained with NucleoMiner 1.0 (refer to
          the Appendice "Generate .c2c Files"), then use it to produce the
          list of regions and customise “translate”.
     In your R console, run the following command line:
        source("src/current/headers.R")
     The Script extract_maps.R
     -------------------------
     This script is in charge of extracting Maps for well-positioned and
     fuzzy nucleosomes. First of all, this script computes intra and inter-
     strain nucleosome maps for each CUR. This step is executed in parallel
     on many cores using the BoT library. Next, it collects results and
     produces well-positioned, fuzzy and UNR maps.
     The well-positioned map for BY is collected in the result directory
     and is called *BY_wp.tab*. It is composed of following columns:
        * chr, the number of the chromosome
        * lower_bound, the lower bound of the nucleosome
        * upper_bound, the upper bound of the nucleosome
        * cur_index, index of the CUR
        * index_nuc, the index of the nucleosome in the CUR
        * wp, 1 if it is a well positioned nucleosome, 0 otherwise
        * nb_reads, the number of reads that support this nucleosome
        * nb_nucs, the number of TemplateFilter nucleosome across
          replicates (= the number of replicates in which it is a well-
          positioned nucleosome)
        * llr_1, for a well-positioned nucleosome, it is the LLR1 (log-
          likelihood ratio) between the first and the second TemplateFilter
          nucleosome on the chain.
        * llr_2, for a well-positioned nucleosome, it is the LLR1 between
          the second and the third TemplateFilter nucleosome on the chain.
        * wp_llr, for a well-positioned nucleosome, it is the LLR2 that
          compares consistency of the positioning over all TemplateFilter
          nucleosomes.
        * wp_pval, for a well-positioned nucleosome, it is the p-value
          chi square test obtained with the LLR2 (*1-pchisq(2.LLR2, df=4)*)
        * dyad_shift, for a well-positioned nucleosome, it is the shift
          between the two extreme TemplateFilter nucleosome dyad positions.
     The fuzzy map for BY is collected in the result directory and is
     called *BY_fuzzy.tab*. It is composed of following columns:
        * chr, the number of the chromosome
        * lower_bound, the lower bound of the nucleosome
        * upper_bound, the upper bound of the nucleosome
        * cur_index, index of the CUR
     The map of common well-positioned nucleosomes aligned between the BY
     and RM strains is collected in the result directory and is called
     *BY_RM_common_wp.tab*. It is composed of following columns:
        * cur_index, the index of the CUR
        * index_nuc_BY, the index of the BY nucleosome in the CUR
        * index_nuc_RM, the index of the RM nucleosome in the CUR
        * llr_score, , the LLR3 score that estimates conservation between
          the positions in BY and RM
        * common_wp_pval,  the p-value chi square test obtained from LLR3
          (*1-pchisq(2.LLR3, df=2)*)
        * diff, the dyads shift between the positions in the two strains
     The common UNR map for BY and RM strains is collected in the result
     directory and is called *BY_RM_common_unr.tab*. It is composed of the
     following columns:
        * cur_index, the index of the CUR
        * index_nuc_BY, the index of the BY nucleosome in the CUR
        * index_nuc_RM,the index of the RM nucleosome in the CUR
     To execute this script, run the following command in your R console:
        source("src/current/extract_maps.R")
     The Script translate_common_wp.R
     --------------------------------
     This script is used to translate common well-positioned nucleosome
     maps from a strain to another strain and stores it into a table.
     For example, the file *results/2014-04/RM_wp_tr_2_BY.tab* contains RM
     well-positioned nucleosome translated into the BY genome coordinates.
     It is composed of following columns:
        * strain_ref, the reference genome (in which positioned are
          defined)
        * begin, the translated lower bound of the nucleosome
        * end, the translated upper bound of the nucleosome
        * chr, the number of chromosomes for the reference genome (in
          which positioned are defined)
        * length, the length of the nucleosome (could be negative)
        * cur_index, the index of the CUR
        * index_nuc, the index of the nucleosome in the CUR
     To execute this script, run the following command in your R console:
        source("src/current/translate_common_wp.R")
     The Script split_samples.R
     --------------------------
     For memory space usage reasons, we split and compress TemplateFilter
     input files according to their corresponding  chromosome. for example,
     *sample_1_TF.tab* will be split into :
        * sample_1_chr_1_splited_sample.tab.gz
        * sample_1_chr_2_splited_sample.tab.gz
        * ...
        * sample_1_chr_17_splited_sample.tab.gz
     To execute this script, run the following command in your R console:
        source("src/current/split_samples.R")
     The Script count_reads.R
     ------------------------
     To associate a number of observations (read) to each nucleosome we run
     the script *count_reads.R*. It produces the files
     *BY_RM_H3K14ac_wp_and_nbreads.tab*,
     *BY_RM_H3K14ac_unr_and_nbreads.tab*
     *BY_RM_Mnase_Seq_wp_and_nbreads.tab* and
     *BY_RM_Mnase_Seq_unr_and_nbreads.tab* for H3K14ac common well-
     positioned nucleosomes, H3K14ac UNRs, Mnase common well-positioned
     nucleosomes and Mnase UNRs respectively.
     For example, the file *BY_RM_H3K14ac_unr_and_nbreads.tab* contains
     counted reads for well-positioned nucleosomes with the experimental
     condition ChIP H3K14ac. It is composed of the following columns:
        * chr_BY, the number of the chromosome for BY
        * lower_bound_BY, the lower bound of the nucleosome for BY
        * upper_bound_BY, the upper bound of the nucleosome  for BY
        * index_nuc_BY, the index of the BY nucleosome in the CUR for BY
        * chr_RM, the number of the chromosome for RM
        * lower_bound_RM, the lower bound of the nucleosome for RM
        * upper_bound_RM, the upper bound of the nucleosome  for RM
        * index_nuc_RM,the index of the RM nucleosome in the CUR for RM
        * cur_index, index of the CUR
        * BY_H3K14ac_36, the number of reads for the current nucleosome
          for the sample 36
        * BY_H3K14ac_37, #reads for sample 37
        * BY_H3K14ac_53, #reads for sample 53
        * RM_H3K14ac_38, #reads for sample 38
        * RM_H3K14ac_39, #reads for sample 39
     To execute this script, run the following command in your R console:
        source("src/current/count_reads.R")
     The Script get_size_factors.R
     -----------------------------
     This script uses the DESeq function *estimateSizeFactors* to compute
     the size factor of each sample. It corresponds to normalisation of
     read counts from sample to sample, as determined by DESeq. When a
     sample has n reads for a nucleosome or a UNR, the normalised count is
     n/f where f is the factor contained in this file. The script dumps
     computed size factors into the file *size_factors.tab*. This file has
     the form:
     +-----------+---------+---------+---------+
     | sample_id | wp      | unr     | wpunr   |
     +===========+=========+=========+=========+
     | 1         | 0.87396 | 0.88097 | 0.87584 |
     +-----------+---------+---------+---------+
     | 2         | 1.07890 | 1.07440 | 1.07760 |
     +-----------+---------+---------+---------+
     | 3         | 1.06400 | 1.05890 | 1.06250 |
     +-----------+---------+---------+---------+
     | 4         | 0.85782 | 0.87948 | 0.86305 |
     +-----------+---------+---------+---------+
     | 5         | 0.97577 | 0.96590 | 0.97307 |
     +-----------+---------+---------+---------+
     | 6         | 1.19630 | 1.18120 | 1.19190 |
     +-----------+---------+---------+---------+
     | 36        | 0.93318 | 0.92762 | 0.93166 |
     +-----------+---------+---------+---------+
     | 37        | 0.48315 | 0.48453 | 0.48350 |
     +-----------+---------+---------+---------+
     | 38        | 1.11240 | 1.11210 | 1.11230 |
     +-----------+---------+---------+---------+
     | 39        | 0.89897 | 0.89917 | 0.89903 |
     +-----------+---------+---------+---------+
     | 53        | 2.22650 | 2.22700 | 2.22660 |
     +-----------+---------+---------+---------+
     sample_id are given in file samples.csv
     If you don't know which column to use, we recommend using wpunr.
     If you want the very detailed factors produced by DESeq, here are the
     information:
        * unr: factor computed from data of UNR regions. These regions
          are defined for every pairs of aligned genomes (e.g. BY_RM)
        * wp: same, but for well-positioned nucleosomes.
        * wpunr: both types of regions.
     To execute this script, run the following command in your R console:
        source("src/current/get_size_factors.R")
     The Script launch_deseq.R
     -------------------------
     Finally, the script *launch_deseq.R* perform statistical analysis on
     each nucleosome using *DESeq*. It produces files:
        * results/current/BY_RM_H3K14ac_wp_snep.tab
        * results/current/BY_RM_H3K14ac_unr_snep.tab
        * results/current/BY_RM_H3K14ac_wpunr_snep.tab
        * results/current/BY_RM_H3K14ac_wp_mnase.tab
        * results/current/BY_RM_H3K14ac_unr_mnase.tab
        * results/current/BY_RM_H3K14ac_wpunr_mnase.tab
     These files are organised with the following columns (see file
     *BY_RM_H3K14ac_wp_snep.tab* for an example):
        * chr_BY, the number of the chromosome for BY
        * lower_bound_BY, the lower bound of the nucleosome for BY
        * upper_bound_BY, the upper bound of the nucleosome  for BY
        * index_nuc_BY, the index of the BY nucleosome in the CUR for BY
        * chr_RM, the number of the chromosome for RM
        * lower_bound_RM, the lower bound of the nucleosome for RM
        * upper_bound_RM, the upper bound of the nucleosome  for RM
        * index_nuc_RM,the index of the RM nucleosome in the CUR for RM
        * cur_index, index of the CUR
        * form
        * BY_Mnase_Seq_1, the number of reads for the current nucleosome
          for the sample 1
     Next columns concern indicators for each sample:
        * BY_Mnase_Seq_2, #reads for sample 2
        * BY_Mnase_Seq_3, #reads for sample 3
        * RM_Mnase_Seq_4, #reads for sample 4
        * RM_Mnase_Seq_5, #reads for sample 5
        * RM_Mnase_Seq_6, #reads for sample 6
        * BY_H3K14ac_36, #reads for sample 36
        * BY_H3K14ac_37, #reads for sample 37
        * BY_H3K14ac_53, #reads for sample 53
        * RM_H3K14ac_38, #reads for sample 38
        * RM_H3K14ac_39, #reads for sample 39
     The 5 last columns concern DESeq analysis:
        * manip[a_manip] strain[a_strain]
          manip[a_strain]:strain[a_strain], the manip (marker) effect, the
          strain effect and the snep effect. These are the coefficients of
          the fitted generalized linear model.
        * pvalsGLM, the pvalue resulting of the comparison of the GLM
          model considering or not the interaction term marker:strain. This
          is the statsitcial significance of the interaction term and
          therefore the statistical significance of the SNEP.
        * snep_index, a boolean set to TRUE if the pvalueGLM value is
          under the threshold computed with FDR function with a rate set to
 .0001.
     To execute this script, run the following command
     To execute this script, run the following command in your R console:
        source("src/current/launch_deseq.R")
     Results: Number of SNEPs
     ========================
     Here are the number of computed SNEPs for each forms.
     +-------+---------+-------+---------+
     | form  | strains | #nucs | H3K14ac |
     +=======+=========+=======+=========+
     | wp    | BY-RM   | 30464 | 3549    |
     +-------+---------+-------+---------+
     | unr   | BY-RM   | 9497  | 1559    |
     +-------+---------+-------+---------+
     | wpunr | BY-RM   | 39961 | 5240    |
     +-------+---------+-------+---------+
     APPENDICE: Generate .c2c Files
     ==============================
     $$$ TODO make it works properly. working directory.
     The *.c2c* files is a simple table that describes how the genome
     sequence can be aligned. We generate it using NucleoMiner 1.0.
     To install NucleoMiner 1.0 on your UNIX/LINUX computer you need first
     to install the Genetic Data analysis Library (GDL), which is a dynamic
     library of useful C functions derived from the GNU Scientific Library.
     Installing the GDL library
     --------------------------
     Get the gdl-1.0.tar.gz archive on your computer (in the directory deps
     of your working directory). Copy it in a dedicated directory. Go into
     this directory using the cd command, and then unfold the archive by
     typing:
     tar -xvzf gdl-1.0.tar.gz
     This creates a directory called gdl-1.0. You now need to go into this
     directory and compile the library, by typing:
        mkdir tmp_c2c_workdir
        cd tmp_c2c_workdir
        cp ../deps/gdl-1.0.tar.gz .
        tar -xvzf gdl-1.0.tar.gz
        cd gdl-1.0
        ./configure
        make
        cd ..
     Now you need to install the library on your system. This needs root
     priviledges:
        sudo make install
     Installing NucleoMiner 1.0
     --------------------------
     Get the nucleominer-1.0.tar.gz archive on your computer. Copy it in a
     dedicated directory. Go into this directory using the cd command, and
     then unfold the archive by typing:
     This creates a directory called nucleominer-1.0. You now need to go
     into this directory and compile the library, by typing:
        cp ../deps/nucleominer-1.0.tar.gz .
        tar -xvzf nucleominer-1.0.tar.gz
        cd nucleominer-1.0
        ln -s ../gdl-1.0/gdl
        ./configure
        make
     You can then use the binaries dircetly from this folder (best then is
     to add the path to this folder in your PATH environment variable). If
     you want to install nucleominer at the system's level (useful if
     mutiple users will need it) then type, with root priviledges:
        sudo make install
     Generate .c2c Files
     -------------------
     To generate .c2c files you need to type the following command in a
     terminal:
        mkdir dir_4_c2c
        NMgxcomp ../data/saccharomyces_cerevisiae_BY_S288c_chromosomes.fasta\
                 ../data/saccharomyces_cerevisiae_rm11-1a_1_supercontigs.fasta\
                 dir_4_c2c/BY_RM 2>dir_4_c2c/BY_RM.log
     After execution, the directory *dir_4_c2c* will hold the .c2c files.

Formats disponibles : Unified diff

LBMC » NucleoMiner

Révision 3c88abd0