root / doc / R_tutorial / NucleoMiner2.Rnw @ 780f632a
Historique | Voir | Annoter | Télécharger (12,5 ko)
1 | 780f632a | Florent Chuffart | % Generated by Sphinx. |
---|---|---|---|
2 | 780f632a | Florent Chuffart | % \def\sphinxdocclass{report} |
3 | 780f632a | Florent Chuffart | \documentclass[letterpaper,10pt,english]{article} |
4 | 780f632a | Florent Chuffart | \usepackage[utf8]{inputenc} |
5 | 780f632a | Florent Chuffart | \DeclareUnicodeCharacter{00A0}{\nobreakspace} |
6 | 780f632a | Florent Chuffart | \usepackage{cmap} |
7 | 780f632a | Florent Chuffart | \usepackage[T1]{fontenc} |
8 | 780f632a | Florent Chuffart | \usepackage{babel} |
9 | 780f632a | Florent Chuffart | \usepackage{times} |
10 | 780f632a | Florent Chuffart | \usepackage[Bjarne]{fncychap} |
11 | 780f632a | Florent Chuffart | \usepackage{longtable} |
12 | 780f632a | Florent Chuffart | \usepackage{multirow} |
13 | 780f632a | Florent Chuffart | \usepackage{url} |
14 | 780f632a | Florent Chuffart | |
15 | 780f632a | Florent Chuffart | \addto\captionsenglish{\renewcommand{\figurename}{Fig. }} |
16 | 780f632a | Florent Chuffart | \addto\captionsenglish{\renewcommand{\tablename}{Table }} |
17 | 780f632a | Florent Chuffart | |
18 | 780f632a | Florent Chuffart | |
19 | 780f632a | Florent Chuffart | |
20 | 780f632a | Florent Chuffart | \title{NucleoMiner2 Documentation} |
21 | 780f632a | Florent Chuffart | \date{\today} |
22 | 780f632a | Florent Chuffart | \author{Florent CHUFFART, Jean-Baptiste VEYRIERAS, Gael YVERT} |
23 | 780f632a | Florent Chuffart | \newcommand{\sphinxlogo}{} |
24 | 780f632a | Florent Chuffart | \makeindex |
25 | 780f632a | Florent Chuffart | |
26 | 780f632a | Florent Chuffart | \def\PYGZus{\char`\_} |
27 | 780f632a | Florent Chuffart | |
28 | 780f632a | Florent Chuffart | |
29 | 780f632a | Florent Chuffart | \begin{document} |
30 | 780f632a | Florent Chuffart | |
31 | 780f632a | Florent Chuffart | \maketitle |
32 | 780f632a | Florent Chuffart | \tableofcontents |
33 | 780f632a | Florent Chuffart | \textbf{}\label{index::doc} |
34 | 780f632a | Florent Chuffart | |
35 | 780f632a | Florent Chuffart | |
36 | 780f632a | Florent Chuffart | \section{Inferring Nucleosome Position and Extracting Read Counts} |
37 | 780f632a | Florent Chuffart | \label{tuto:inferring-nucleosome-position-and-extracting-read-counts} |
38 | 780f632a | Florent Chuffart | The second part of the tutorial uses R (\url{http://http://www.r-project.org}{http://http://www.r-project.org}). |
39 | 780f632a | Florent Chuffart | It consists in 5 main steps: |
40 | 780f632a | Florent Chuffart | \begin{itemize} |
41 | 780f632a | Florent Chuffart | |
42 | 780f632a | Florent Chuffart | \item {} |
43 | 780f632a | Florent Chuffart | Seting Up Configuration |
44 | 780f632a | Florent Chuffart | |
45 | 780f632a | Florent Chuffart | \item {} |
46 | 780f632a | Florent Chuffart | Computing CURs |
47 | 780f632a | Florent Chuffart | |
48 | 780f632a | Florent Chuffart | \item {} |
49 | 780f632a | Florent Chuffart | Extracting Nucleosome Maps from TemplateFilter Outputs |
50 | 780f632a | Florent Chuffart | |
51 | 780f632a | Florent Chuffart | \item {} |
52 | 780f632a | Florent Chuffart | Building Count Tables from TemplateFilter Inputs |
53 | 780f632a | Florent Chuffart | |
54 | 780f632a | Florent Chuffart | \item {} |
55 | 780f632a | Florent Chuffart | Analysing Count Tables using DESeq |
56 | 780f632a | Florent Chuffart | |
57 | 780f632a | Florent Chuffart | \end{itemize} |
58 | 780f632a | Florent Chuffart | |
59 | 780f632a | Florent Chuffart | |
60 | 780f632a | Florent Chuffart | |
61 | 780f632a | Florent Chuffart | |
62 | 780f632a | Florent Chuffart | \subsection{Seting Up Configuration} |
63 | 780f632a | Florent Chuffart | |
64 | 780f632a | Florent Chuffart | This first step is in charge of: |
65 | 780f632a | Florent Chuffart | \begin{itemize} |
66 | 780f632a | Florent Chuffart | \item {} |
67 | 780f632a | Florent Chuffart | launching libraries used in the scripts |
68 | 780f632a | Florent Chuffart | |
69 | 780f632a | Florent Chuffart | \item {} |
70 | 780f632a | Florent Chuffart | launching configuration (design, strain, marker...) |
71 | 780f632a | Florent Chuffart | |
72 | 780f632a | Florent Chuffart | {\footnotesize |
73 | 780f632a | Florent Chuffart | <<echo=TRUE, results=hide>>= |
74 | 780f632a | Florent Chuffart | library(nucleominer) |
75 | 780f632a | Florent Chuffart | @ |
76 | 780f632a | Florent Chuffart | <<echo=TRUE, results=verbatim>>= |
77 | 780f632a | Florent Chuffart | package.version("nucleominer") |
78 | 780f632a | Florent Chuffart | |
79 | 780f632a | Florent Chuffart | library(rjson) |
80 | 780f632a | Florent Chuffart | json_conf_file = "nucleominer_config.json" |
81 | 780f632a | Florent Chuffart | config = fromJSON(paste(readLines(json_conf_file), collapse="")) |
82 | 780f632a | Florent Chuffart | |
83 | 780f632a | Florent Chuffart | all_samples = read.csv(config$CSV_SAMPLE_FILE, sep=";", head=TRUE, stringsAsFactors=FALSE) |
84 | 780f632a | Florent Chuffart | head(all_samples) |
85 | 780f632a | Florent Chuffart | |
86 | 780f632a | Florent Chuffart | markers = unique(all_samples$marker) |
87 | 780f632a | Florent Chuffart | print(markers) |
88 | 780f632a | Florent Chuffart | |
89 | 780f632a | Florent Chuffart | strains = unique(all_samples$strain) |
90 | 780f632a | Florent Chuffart | print(strains) |
91 | 780f632a | Florent Chuffart | |
92 | 780f632a | Florent Chuffart | combis = combn(strains, 2, simplify=FALSE) |
93 | 780f632a | Florent Chuffart | print(combis) |
94 | 780f632a | Florent Chuffart | @ |
95 | 780f632a | Florent Chuffart | } |
96 | 780f632a | Florent Chuffart | |
97 | 780f632a | Florent Chuffart | |
98 | 780f632a | Florent Chuffart | \subsection{Computing CURs} |
99 | 780f632a | Florent Chuffart | |
100 | 780f632a | Florent Chuffart | This second step is in charge of computing and caching Common Uinterrupted Regions (CURs). Caching means storing the information in the computer's memory. |
101 | 780f632a | Florent Chuffart | |
102 | 780f632a | Florent Chuffart | \end{itemize} |
103 | 780f632a | Florent Chuffart | |
104 | 780f632a | Florent Chuffart | Note that you can customize the function “translate”. This function allows you to use the alignments between genomes when performing various tasks. |
105 | 780f632a | Florent Chuffart | \begin{itemize} |
106 | 780f632a | Florent Chuffart | \item {} |
107 | 780f632a | Florent Chuffart | You may want to analyze data of a single strain (e.g. treatment/control, or only few mutations). In this case, the genome is identical across all samples and you do not need to define particular CURs (CURs are chromosomes). Simply use the default translate function which is neutral. |
108 | 780f632a | Florent Chuffart | |
109 | 780f632a | Florent Chuffart | \item {} |
110 | 780f632a | Florent Chuffart | If you are analyzing data from two or more strains (as NucleoMiner2 was designed for), then you need to translate coordinates of one genome into the coordinates of another one. You must do this by aligning the two genomes, which will produce a .c2c file (see Appendice ``Generate .c2c Files''). thenuse it to produce the list of regions and customise “translate”. |
111 | 780f632a | Florent Chuffart | |
112 | 780f632a | Florent Chuffart | \end{itemize} |
113 | 780f632a | Florent Chuffart | |
114 | 780f632a | Florent Chuffart | In our tutorial, we are in the second case and to perform all these steps run the following command line in your R console: |
115 | 780f632a | Florent Chuffart | |
116 | 780f632a | Florent Chuffart | |
117 | 780f632a | Florent Chuffart | {\footnotesize |
118 | 780f632a | Florent Chuffart | <<echo=TRUE, results=hide>>= |
119 | 780f632a | Florent Chuffart | curs = compute_curs(diff_allowed=30, min_cur_width=4000, combis, config) |
120 | 780f632a | Florent Chuffart | @ |
121 | 780f632a | Florent Chuffart | <<echo=TRUE, results=verbatim>>= |
122 | 780f632a | Florent Chuffart | head(curs) |
123 | 780f632a | Florent Chuffart | # Reduce first CUR length to accelerate demo, TODO: remove it before submission. |
124 | 780f632a | Florent Chuffart | curs [1,3] = curs[1,3] - 110000 |
125 | 780f632a | Florent Chuffart | curs [1,5] = curs[1,5] - 110000 |
126 | 780f632a | Florent Chuffart | @ |
127 | 780f632a | Florent Chuffart | } |
128 | 780f632a | Florent Chuffart | |
129 | 780f632a | Florent Chuffart | |
130 | 780f632a | Florent Chuffart | \subsection{Extracting Nucleosome Maps from TemplateFilter Outputs} |
131 | 780f632a | Florent Chuffart | |
132 | 780f632a | Florent Chuffart | This third step is in charge of extracting Maps for well-positioned and sensitive nucleosomes. First of all, this script computes intra and inter-strain matches of nucleosome maps for each CUR. This step can be executed in parallel on many cores using the BoT library. Next, it collects results and produces maps of well-positioned nucleosomes, sensitive nucleosomes and Unaligned Nucleosomal Regions . |
133 | 780f632a | Florent Chuffart | |
134 | 780f632a | Florent Chuffart | The map of well-positioned nucleosomes for BY is collected in the result directory and is called \emph{BY\_wp.tab}. It is composed of following columns: |
135 | 780f632a | Florent Chuffart | \begin{itemize} |
136 | 780f632a | Florent Chuffart | \item {} |
137 | 780f632a | Florent Chuffart | chr, the number of the chromosome |
138 | 780f632a | Florent Chuffart | |
139 | 780f632a | Florent Chuffart | \item {} |
140 | 780f632a | Florent Chuffart | lower\_bound, the lower bound of the nucleosome |
141 | 780f632a | Florent Chuffart | |
142 | 780f632a | Florent Chuffart | \item {} |
143 | 780f632a | Florent Chuffart | upper\_bound, the upper bound of the nucleosome |
144 | 780f632a | Florent Chuffart | |
145 | 780f632a | Florent Chuffart | \item {} |
146 | 780f632a | Florent Chuffart | cur\_index, index of the CUR |
147 | 780f632a | Florent Chuffart | |
148 | 780f632a | Florent Chuffart | \item {} |
149 | 780f632a | Florent Chuffart | index\_nuc, the index of the nucleosome in the CUR |
150 | 780f632a | Florent Chuffart | |
151 | 780f632a | Florent Chuffart | \item {} |
152 | 780f632a | Florent Chuffart | wp, 1 if it is a well positioned nucleosome, 0 otherwise |
153 | 780f632a | Florent Chuffart | |
154 | 780f632a | Florent Chuffart | \item {} |
155 | 780f632a | Florent Chuffart | nb\_reads, the number of reads that support this nucleosome |
156 | 780f632a | Florent Chuffart | |
157 | 780f632a | Florent Chuffart | \item {} |
158 | 780f632a | Florent Chuffart | nb\_nucs, the number of TemplateFilter nucleosome across replicates (= the number of replicates in which it is a well-positioned nucleosome) |
159 | 780f632a | Florent Chuffart | |
160 | 780f632a | Florent Chuffart | \item {} |
161 | 780f632a | Florent Chuffart | llr\_1, for a well-positioned nucleosome, it is the LLR1 (log-likelihood ratio) between the first and the second TemplateFilter nucleosome on the chain. |
162 | 780f632a | Florent Chuffart | |
163 | 780f632a | Florent Chuffart | \item {} |
164 | 780f632a | Florent Chuffart | llr\_2, for a well-positioned nucleosome, it is the LLR1 between the second and the third TemplateFilter nucleosome on the chain. |
165 | 780f632a | Florent Chuffart | |
166 | 780f632a | Florent Chuffart | \item {} |
167 | 780f632a | Florent Chuffart | wp\_llr, for a well-positioned nucleosome, it is the LLR2 that compares consistency of the positioning over all TemplateFilter nucleosomes. |
168 | 780f632a | Florent Chuffart | |
169 | 780f632a | Florent Chuffart | \item {} |
170 | 780f632a | Florent Chuffart | wp\_pval, for a well-positioned nucleosome, it is the p-value chi square test obtained from LLR2 (\emph{1-pchisq(2.LLR2, df=4)}) |
171 | 780f632a | Florent Chuffart | |
172 | 780f632a | Florent Chuffart | \item {} |
173 | 780f632a | Florent Chuffart | dyad\_shift, for a well-positioned nucleosome, it is the shift between the two extreme TemplateFilter nucleosome dyad positions. |
174 | 780f632a | Florent Chuffart | |
175 | 780f632a | Florent Chuffart | \end{itemize} |
176 | 780f632a | Florent Chuffart | |
177 | 780f632a | Florent Chuffart | The sensitive map for BY is collected in the result directory and is called \emph{BY\_fuzzy.tab}. It is composed of following columns: |
178 | 780f632a | Florent Chuffart | \begin{itemize} |
179 | 780f632a | Florent Chuffart | \item {} |
180 | 780f632a | Florent Chuffart | chr, the number of the chromosome |
181 | 780f632a | Florent Chuffart | |
182 | 780f632a | Florent Chuffart | \item {} |
183 | 780f632a | Florent Chuffart | lower\_bound, the lower bound of the nucleosome |
184 | 780f632a | Florent Chuffart | |
185 | 780f632a | Florent Chuffart | \item {} |
186 | 780f632a | Florent Chuffart | upper\_bound, the upper bound of the nucleosome |
187 | 780f632a | Florent Chuffart | |
188 | 780f632a | Florent Chuffart | \item {} |
189 | 780f632a | Florent Chuffart | cur\_index, index of the CUR |
190 | 780f632a | Florent Chuffart | |
191 | 780f632a | Florent Chuffart | \end{itemize} |
192 | 780f632a | Florent Chuffart | |
193 | 780f632a | Florent Chuffart | The map of common well-positioned nucleosomes aligned between the BY and RM strains is collected in the result directory and is called \emph{BY\_RM\_common\_wp.tab}. It is composed of following columns: |
194 | 780f632a | Florent Chuffart | \begin{itemize} |
195 | 780f632a | Florent Chuffart | \item {} |
196 | 780f632a | Florent Chuffart | cur\_index, the index of the CUR |
197 | 780f632a | Florent Chuffart | |
198 | 780f632a | Florent Chuffart | \item {} |
199 | 780f632a | Florent Chuffart | index\_nuc\_BY, the index of the BY nucleosome in the CUR |
200 | 780f632a | Florent Chuffart | |
201 | 780f632a | Florent Chuffart | \item {} |
202 | 780f632a | Florent Chuffart | index\_nuc\_RM, the index of the RM nucleosome in the CUR |
203 | 780f632a | Florent Chuffart | |
204 | 780f632a | Florent Chuffart | \item {} |
205 | 780f632a | Florent Chuffart | llr\_score, , the LLR3 score that estimates conservation between the positions in BY and RM |
206 | 780f632a | Florent Chuffart | |
207 | 780f632a | Florent Chuffart | \item {} |
208 | 780f632a | Florent Chuffart | common\_wp\_pval, the p-value chi square test obtained from LLR3 (\emph{1-pchisq(2.LLR3, df=2)}) |
209 | 780f632a | Florent Chuffart | |
210 | 780f632a | Florent Chuffart | \item {} |
211 | 780f632a | Florent Chuffart | diff, the dyads shift between the positions in the two strains (in bp) |
212 | 780f632a | Florent Chuffart | |
213 | 780f632a | Florent Chuffart | \end{itemize} |
214 | 780f632a | Florent Chuffart | |
215 | 780f632a | Florent Chuffart | The common UNR map for BY and RM strains is collected in the result directory and is called \emph{BY\_RM\_common\_unr.tab}. It is composed of the following columns: |
216 | 780f632a | Florent Chuffart | \begin{itemize} |
217 | 780f632a | Florent Chuffart | \item {} |
218 | 780f632a | Florent Chuffart | cur\_index, the index of the CUR |
219 | 780f632a | Florent Chuffart | |
220 | 780f632a | Florent Chuffart | \item {} |
221 | 780f632a | Florent Chuffart | index\_nuc\_BY, the index of the BY nucleosome in the CUR |
222 | 780f632a | Florent Chuffart | |
223 | 780f632a | Florent Chuffart | \item {} |
224 | 780f632a | Florent Chuffart | index\_nuc\_RM,the index of the RM nucleosome in the CUR |
225 | 780f632a | Florent Chuffart | |
226 | 780f632a | Florent Chuffart | \end{itemize} |
227 | 780f632a | Florent Chuffart | |
228 | 780f632a | Florent Chuffart | To execute this script, run the following command in your R console: |
229 | 780f632a | Florent Chuffart | |
230 | 780f632a | Florent Chuffart | |
231 | 780f632a | Florent Chuffart | {\footnotesize |
232 | 780f632a | Florent Chuffart | <<echo=TRUE, results=hide>>= |
233 | 780f632a | Florent Chuffart | build_maps(strains, combis, all_samples, curs, config) |
234 | 780f632a | Florent Chuffart | @ |
235 | 780f632a | Florent Chuffart | <<echo=TRUE, results=verbatim>>= |
236 | 780f632a | Florent Chuffart | BY_wp = read.table("outputs/BY_wp.tab", header=TRUE) |
237 | 780f632a | Florent Chuffart | head(BY_wp) |
238 | 780f632a | Florent Chuffart | RM_wp = read.table("outputs/RM_wp.tab", header=TRUE) |
239 | 780f632a | Florent Chuffart | head(RM_wp) |
240 | 780f632a | Florent Chuffart | BY_RM_common_wp = read.table("outputs/BY_RM_common_wp.tab", header=TRUE) |
241 | 780f632a | Florent Chuffart | head(BY_RM_common_wp) |
242 | 780f632a | Florent Chuffart | @ |
243 | 780f632a | Florent Chuffart | } |
244 | 780f632a | Florent Chuffart | |
245 | 780f632a | Florent Chuffart | \subsection{Building Count Tables from TemplateFilter Inputs |
246 | 780f632a | Florent Chuffart | } |
247 | 780f632a | Florent Chuffart | \label{tuto:the-script-count-reads-r} |
248 | 780f632a | Florent Chuffart | To associate a number of observations (read) to each nucleosome we run the script \emph{count\_reads.R}. It produces the files \emph{BY\_RM\_H3K14ac\_wp\_and\_nbreads.tab}, \emph{BY\_RM\_H3K14ac\_unr\_and\_nbreads.tab} \emph{BY\_RM\_Mnase\_Seq\_wp\_and\_nbreads.tab} and \emph{BY\_RM\_Mnase\_Seq\_unr\_and\_nbreads.tab} |
249 | 780f632a | Florent Chuffart | for H3K14ac common well-positioned nucleosomes, H3K14ac UNRs, Mnase common well-positioned nucleosomes and Mnase UNRs respectively. |
250 | 780f632a | Florent Chuffart | |
251 | 780f632a | Florent Chuffart | For example, the file \emph{BY\_RM\_H3K14ac\_unr\_and\_nbreads.tab} contains counted reads for well-positioned nucleosomes with the experimental condition ChIP H3K14ac. It is composed of the following columns: |
252 | 780f632a | Florent Chuffart | \begin{itemize} |
253 | 780f632a | Florent Chuffart | \item {} |
254 | 780f632a | Florent Chuffart | chr\_BY, the number of the chromosome for BY |
255 | 780f632a | Florent Chuffart | |
256 | 780f632a | Florent Chuffart | \item {} |
257 | 780f632a | Florent Chuffart | lower\_bound\_BY, the lower bound of the nucleosome for BY |
258 | 780f632a | Florent Chuffart | |
259 | 780f632a | Florent Chuffart | \item {} |
260 | 780f632a | Florent Chuffart | upper\_bound\_BY, the upper bound of the nucleosome for BY |
261 | 780f632a | Florent Chuffart | |
262 | 780f632a | Florent Chuffart | \item {} |
263 | 780f632a | Florent Chuffart | index\_nuc\_BY, the index of the BY nucleosome in the CUR for BY |
264 | 780f632a | Florent Chuffart | |
265 | 780f632a | Florent Chuffart | \item {} |
266 | 780f632a | Florent Chuffart | chr\_RM, the number of the chromosome for RM |
267 | 780f632a | Florent Chuffart | |
268 | 780f632a | Florent Chuffart | \item {} |
269 | 780f632a | Florent Chuffart | lower\_bound\_RM, the lower bound of the nucleosome for RM |
270 | 780f632a | Florent Chuffart | |
271 | 780f632a | Florent Chuffart | \item {} |
272 | 780f632a | Florent Chuffart | upper\_bound\_RM, the upper bound of the nucleosome for RM |
273 | 780f632a | Florent Chuffart | |
274 | 780f632a | Florent Chuffart | \item {} |
275 | 780f632a | Florent Chuffart | index\_nuc\_RM,the index of the RM nucleosome in the CUR for RM |
276 | 780f632a | Florent Chuffart | |
277 | 780f632a | Florent Chuffart | \item {} |
278 | 780f632a | Florent Chuffart | cur\_index, index of the CUR |
279 | 780f632a | Florent Chuffart | |
280 | 780f632a | Florent Chuffart | \item {} |
281 | 780f632a | Florent Chuffart | BY\_H3K14ac\_36, the number of reads for the current nucleosome for the sample 36 |
282 | 780f632a | Florent Chuffart | |
283 | 780f632a | Florent Chuffart | \item {} |
284 | 780f632a | Florent Chuffart | BY\_H3K14ac\_37, \#reads for sample 37 |
285 | 780f632a | Florent Chuffart | |
286 | 780f632a | Florent Chuffart | \item {} |
287 | 780f632a | Florent Chuffart | BY\_H3K14ac\_53, \#reads for sample 53 |
288 | 780f632a | Florent Chuffart | |
289 | 780f632a | Florent Chuffart | \item {} |
290 | 780f632a | Florent Chuffart | RM\_H3K14ac\_38, \#reads for sample 38 |
291 | 780f632a | Florent Chuffart | |
292 | 780f632a | Florent Chuffart | \item {} |
293 | 780f632a | Florent Chuffart | RM\_H3K14ac\_39, \#reads for sample 39 |
294 | 780f632a | Florent Chuffart | |
295 | 780f632a | Florent Chuffart | \end{itemize} |
296 | 780f632a | Florent Chuffart | |
297 | 780f632a | Florent Chuffart | To execute this script, run the following command in your R console: |
298 | 780f632a | Florent Chuffart | |
299 | 780f632a | Florent Chuffart | |
300 | 780f632a | Florent Chuffart | {\footnotesize |
301 | 780f632a | Florent Chuffart | <<echo=TRUE, results=hide>>= |
302 | 780f632a | Florent Chuffart | for (form in c("wp", "unr")) { |
303 | 780f632a | Florent Chuffart | for (marker in markers) { |
304 | 780f632a | Florent Chuffart | build_count_table(marker, combis[[1]], form, curs, all_samples, config) |
305 | 780f632a | Florent Chuffart | } |
306 | 780f632a | Florent Chuffart | } |
307 | 780f632a | Florent Chuffart | @ |
308 | 780f632a | Florent Chuffart | <<echo=TRUE, results=verbatim>>= |
309 | 780f632a | Florent Chuffart | a_count_table = read.table("outputs/BY_RM_Mnase_Seq_wp_and_nbreads.tab", header=TRUE) |
310 | 780f632a | Florent Chuffart | head(a_count_table) |
311 | 780f632a | Florent Chuffart | @ |
312 | 780f632a | Florent Chuffart | } |
313 | 780f632a | Florent Chuffart | |
314 | 780f632a | Florent Chuffart | \subsection{Analysing Count Tables using DESeq} |
315 | 780f632a | Florent Chuffart | |
316 | 780f632a | Florent Chuffart | Finally, the script \emph{launch\_deseq.R} perform statistical analysis on each nucleosome using \emph{DESeq}. It produces files: |
317 | 780f632a | Florent Chuffart | \begin{itemize} |
318 | 780f632a | Florent Chuffart | \item {} |
319 | 780f632a | Florent Chuffart | results/current/BY\_RM\_H3K14ac\_wp\_snep.tab |
320 | 780f632a | Florent Chuffart | |
321 | 780f632a | Florent Chuffart | \item {} |
322 | 780f632a | Florent Chuffart | results/current/BY\_RM\_H3K14ac\_unr\_snep.tab |
323 | 780f632a | Florent Chuffart | |
324 | 780f632a | Florent Chuffart | \item {} |
325 | 780f632a | Florent Chuffart | results/current/BY\_RM\_H3K14ac\_wpunr\_snep.tab |
326 | 780f632a | Florent Chuffart | |
327 | 780f632a | Florent Chuffart | \end{itemize} |
328 | 780f632a | Florent Chuffart | |
329 | 780f632a | Florent Chuffart | These files are organised with the following columns (see file \emph{BY\_RM\_H3K14ac\_wp\_snep.tab} for an example): |
330 | 780f632a | Florent Chuffart | \begin{itemize} |
331 | 780f632a | Florent Chuffart | \item {} |
332 | 780f632a | Florent Chuffart | chr\_BY, the number of the chromosome for BY |
333 | 780f632a | Florent Chuffart | |
334 | 780f632a | Florent Chuffart | \item {} |
335 | 780f632a | Florent Chuffart | lower\_bound\_BY, the lower bound of the nucleosome for BY |
336 | 780f632a | Florent Chuffart | |
337 | 780f632a | Florent Chuffart | \item {} |
338 | 780f632a | Florent Chuffart | upper\_bound\_BY, the upper bound of the nucleosome for BY |
339 | 780f632a | Florent Chuffart | |
340 | 780f632a | Florent Chuffart | \item {} |
341 | 780f632a | Florent Chuffart | index\_nuc\_BY, the index of the BY nucleosome in the CUR for BY |
342 | 780f632a | Florent Chuffart | |
343 | 780f632a | Florent Chuffart | \item {} |
344 | 780f632a | Florent Chuffart | chr\_RM, the number of the chromosome for RM |
345 | 780f632a | Florent Chuffart | |
346 | 780f632a | Florent Chuffart | \item {} |
347 | 780f632a | Florent Chuffart | lower\_bound\_RM, the lower bound of the nucleosome for RM |
348 | 780f632a | Florent Chuffart | |
349 | 780f632a | Florent Chuffart | \item {} |
350 | 780f632a | Florent Chuffart | upper\_bound\_RM, the upper bound of the nucleosome for RM |
351 | 780f632a | Florent Chuffart | |
352 | 780f632a | Florent Chuffart | \item {} |
353 | 780f632a | Florent Chuffart | index\_nuc\_RM,the index of the RM nucleosome in the CUR for RM |
354 | 780f632a | Florent Chuffart | |
355 | 780f632a | Florent Chuffart | \item {} |
356 | 780f632a | Florent Chuffart | cur\_index, index of the CUR |
357 | 780f632a | Florent Chuffart | |
358 | 780f632a | Florent Chuffart | \item {} |
359 | 780f632a | Florent Chuffart | form |
360 | 780f632a | Florent Chuffart | |
361 | 780f632a | Florent Chuffart | \item {} |
362 | 780f632a | Florent Chuffart | BY\_Mnase\_Seq\_1, the number of reads for the current nucleosome for the sample 1 |
363 | 780f632a | Florent Chuffart | |
364 | 780f632a | Florent Chuffart | \end{itemize} |
365 | 780f632a | Florent Chuffart | |
366 | 780f632a | Florent Chuffart | Next columns concern indicators for each sample: |
367 | 780f632a | Florent Chuffart | \begin{itemize} |
368 | 780f632a | Florent Chuffart | \item {} |
369 | 780f632a | Florent Chuffart | BY\_Mnase\_Seq\_2, \#reads for sample 2 |
370 | 780f632a | Florent Chuffart | |
371 | 780f632a | Florent Chuffart | \item {} |
372 | 780f632a | Florent Chuffart | BY\_Mnase\_Seq\_3, \#reads for sample 3 |
373 | 780f632a | Florent Chuffart | |
374 | 780f632a | Florent Chuffart | \item {} |
375 | 780f632a | Florent Chuffart | RM\_Mnase\_Seq\_4, \#reads for sample 4 |
376 | 780f632a | Florent Chuffart | |
377 | 780f632a | Florent Chuffart | \item {} |
378 | 780f632a | Florent Chuffart | RM\_Mnase\_Seq\_5, \#reads for sample 5 |
379 | 780f632a | Florent Chuffart | |
380 | 780f632a | Florent Chuffart | \item {} |
381 | 780f632a | Florent Chuffart | RM\_Mnase\_Seq\_6, \#reads for sample 6 |
382 | 780f632a | Florent Chuffart | |
383 | 780f632a | Florent Chuffart | \item {} |
384 | 780f632a | Florent Chuffart | BY\_H3K14ac\_36, \#reads for sample 36 |
385 | 780f632a | Florent Chuffart | |
386 | 780f632a | Florent Chuffart | \item {} |
387 | 780f632a | Florent Chuffart | BY\_H3K14ac\_37, \#reads for sample 37 |
388 | 780f632a | Florent Chuffart | |
389 | 780f632a | Florent Chuffart | \item {} |
390 | 780f632a | Florent Chuffart | BY\_H3K14ac\_53, \#reads for sample 53 |
391 | 780f632a | Florent Chuffart | |
392 | 780f632a | Florent Chuffart | \item {} |
393 | 780f632a | Florent Chuffart | RM\_H3K14ac\_38, \#reads for sample 38 |
394 | 780f632a | Florent Chuffart | |
395 | 780f632a | Florent Chuffart | \item {} |
396 | 780f632a | Florent Chuffart | RM\_H3K14ac\_39, \#reads for sample 39 |
397 | 780f632a | Florent Chuffart | |
398 | 780f632a | Florent Chuffart | \end{itemize} |
399 | 780f632a | Florent Chuffart | |
400 | 780f632a | Florent Chuffart | The 7 last columns concern DESeq analysis: |
401 | 780f632a | Florent Chuffart | \begin{itemize} |
402 | 780f632a | Florent Chuffart | \item {} |
403 | 780f632a | Florent Chuffart | manip{[}a\_manip{]} strain{[}a\_strain{]} manip{[}a\_strain{]}:strain{[}a\_strain{]}, the manip (marker) effect, the strain effect and the snep effect. These are the coefficients of the fitted generalized linear model. |
404 | 780f632a | Florent Chuffart | |
405 | 780f632a | Florent Chuffart | \item {} |
406 | 780f632a | Florent Chuffart | pvalsGLM, the pvalue resulting from the comparison of the GLM model considering the interaction term \emph{marker:strain} to the GLM model that does not consider it. This is the statsitcial significance of the interaction term and therefore the statistical significance of the SNEP. |
407 | 780f632a | Florent Chuffart | |
408 | 780f632a | Florent Chuffart | \item {} |
409 | 780f632a | Florent Chuffart | snep\_index, a boolean set to TRUE if the pvalueGLM value is under the threshold computed with FDR function with a rate set to 0.0001. |
410 | 780f632a | Florent Chuffart | |
411 | 780f632a | Florent Chuffart | \item {} |
412 | 780f632a | Florent Chuffart | mnase\_l2fc, this is the coefficients of the fitted generalized linear model based on MNASE samples only. |
413 | 780f632a | Florent Chuffart | |
414 | 780f632a | Florent Chuffart | \item {} |
415 | 780f632a | Florent Chuffart | mnase\_l2fc\_pval, the pvalue resulting from the analysis of the simple model based on MNASE samples only. This is the statsitcial significance of the \emph{strain} term and therefore the statistical significance of the strain effect. |
416 | 780f632a | Florent Chuffart | |
417 | 780f632a | Florent Chuffart | \end{itemize} |
418 | 780f632a | Florent Chuffart | |
419 | 780f632a | Florent Chuffart | To execute this script, run the following command in your R console: |
420 | 780f632a | Florent Chuffart | |
421 | 780f632a | Florent Chuffart | |
422 | 780f632a | Florent Chuffart | {\footnotesize |
423 | 780f632a | Florent Chuffart | <<echo=TRUE, results=hide>>= |
424 | 780f632a | Florent Chuffart | for (form in c("wp", "unr", "wpunr")) { |
425 | 780f632a | Florent Chuffart | sneps = analyse_count_table(markers[2], combis[[1]], form, all_samples, config=config) |
426 | 780f632a | Florent Chuffart | } |
427 | 780f632a | Florent Chuffart | @ |
428 | 780f632a | Florent Chuffart | <<echo=TRUE, results=verbatim>>= |
429 | 780f632a | Florent Chuffart | a_sneps_table = read.table("outputs/BY_RM_H3K14ac_wp_snep.tab", header=TRUE) |
430 | 780f632a | Florent Chuffart | head(a_sneps_table) |
431 | 780f632a | Florent Chuffart | |
432 | 780f632a | Florent Chuffart | @ |
433 | 780f632a | Florent Chuffart | } |
434 | 780f632a | Florent Chuffart | |
435 | 780f632a | Florent Chuffart | Here are the number of computed SNEPs for each forms. |
436 | 780f632a | Florent Chuffart | |
437 | 780f632a | Florent Chuffart | |
438 | 780f632a | Florent Chuffart | |
439 | 780f632a | Florent Chuffart | |
440 | 780f632a | Florent Chuffart | \end{document} |