%% readme.txt %% for DOMINO_script_v1.R Author: Mathieu Quinodoz Name: DOMINO Date: July 18, 2017 --------------------------------------------------------------------------------- - Thank you for using DOMINO. For any questions, please contact: - - mathieu.quinodoz@iob.ch - --------------------------------------------------------------------------------- (1) INTRODUCTION ----------------- DOMINO is a machine learning tool, based on Linear Discriminant Analysis (LDA), that can predict dominance of candidate genes. DOMINO_script_v1.R is the workflow of the tool, see online Supplementary Methods for more details. DOMINO was developed using Rstudio version 0.99.893 with R version 3.3.2 and also tested on Rstudio version 0.99.902 with R version 3.3.3 (2) REQUIRED R PACKAGES ------------------------ "biomaRt" "STRINGdb" "MASS" "plotrix" "pROC" You can install "biomaRt" and "STRINGdb" through bioconductor like: > source("https://bioconductor.org/biocLite.R") > biocLite("biomaRt") > source("https://bioconductor.org/biocLite.R") > biocLite("STRINGdb") All the other packages can be downloaded from CRAN like: > install.packages("MASS") > install.packages("plotrix") > install.packages("pROC") (3) PACKAGE FILES ------------------ All the following files are included in the DOMINO package 9606.protein.links.detailed.v10.RData % Protein network data from the STRING database, pre-processed in R AD-AR.tsv % List of genes with both recessive and dominant inheritance exac_pred.txt % Functional gene constraints from ExAC (version Jan 13, 2016) [1] ExAC.r0.3.processed.RData % All the variants present in ExAC version r0.3, pre-processed in R half_life.txt % mRNA half-life from Sharova et al., 2009 [2] MGI15_short.txt % data from gene expression database (GXT) regarding mouse embrionic expression MGIno_short.txt % data from gene expression database (GXT) regarding mouse embrionic expression NMD.tsv % NMD targets from meta-analysis of Colombo et al., 2017 [3] refGene.promotor.PhyloP.txt % PhyloP scores from -500 to +500 around tss (PhyloP46way.vertebrates) training_set_v1.tsv % Training set validation.tsv % Validation set (4) USAGE ---------- After you have dowloaded all the required R packages 1. Dowload the the DOMINO_v1.tar.gz from https://wwwfbm.unil.ch/domino/download.html 2. Create directory for DOMINO $ mkdir DOMINO_v1 3. Unzip the files and navigate to the DOMINO_v1 directory. $ tar -xf DOMINO_v1.tar.gz --directory ./DOMINO_v1 $ cd DOMINO_v1 4. Open DOMINO_script_v1.R in Rstudio 5. Change into the DOMINO directory on line 10 > work_directory="/home/DOMINO_v1" # change to your domino working directory 6. Run the script 7. The script will create four files: a) results.tsv % Table of all the autosomal genes with P(AD) scores and selected features values b) final_resutls.RData % Saved workspace of the workflow d) results_graphics.pdf % All the figures Note: the alghoritm is set to 15th iteration, you can change this setting on line 1085. > final_formula=form_it[15] # change if needed ----------------------------------------------------------------------------------- References: [1] Samocha, K.E., Robinson, E.B., Sanders, S.J., Stevens, C., Sabo, A., McGrath, L.M., Kosmicki, J.A., Rehnstrom, K., Mallick, S., Kirby, A., et al. (2014). A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944-950. [2] Sharova, L.V., Sharov, A.A., Nedorezov, T., Piao, Y., Shaik, N., and Ko, M.S. (2009). Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells. DNA Res. 16, 45-58. [3] Colombo, M., Karousis, E.D., Bourquin, J., Bruggmann, R., and Muhlemann, O. (2017). Transcriptome-wide identification of NMD-targeted human mRNAs reveals extensive redundancy between SMG6- and SMG7-mediated degradation pathways. RNA 23, 189-201.