%% readme.txt
%% for DOMINO_script_v1.R

Author: Mathieu Quinodoz
Name: DOMINO
Date: July 18, 2017

		---------------------------------------------------------------------------------
		-        Thank you for using DOMINO. For any questions, please contact:		-
		-  mathieu.quinodoz@iob.ch  -
		---------------------------------------------------------------------------------


(1) INTRODUCTION
-----------------

DOMINO is a machine learning tool, based on Linear Discriminant Analysis (LDA), that can predict dominance of candidate genes.
DOMINO_script_v1.R is the workflow of the tool, see online Supplementary Methods for more details. 

DOMINO was developed using Rstudio version 0.99.893 with R version 3.3.2 and also tested on Rstudio version 0.99.902 with R version 3.3.3

(2) REQUIRED R PACKAGES
------------------------

"biomaRt" 
"STRINGdb"
"MASS"
"plotrix"
"pROC"

You can install "biomaRt" and "STRINGdb" through bioconductor like:

> source("https://bioconductor.org/biocLite.R")
> biocLite("biomaRt")

> source("https://bioconductor.org/biocLite.R")
> biocLite("STRINGdb")

All the other packages can be downloaded from CRAN like:

> install.packages("MASS")
> install.packages("plotrix")
> install.packages("pROC")

(3) PACKAGE FILES
------------------

All the following files are included in the DOMINO package

9606.protein.links.detailed.v10.RData		% Protein network data from the STRING database, pre-processed in R
AD-AR.tsv                              		% List of genes with both recessive and dominant inheritance
exac_pred.txt                          		% Functional gene constraints from ExAC (version Jan 13, 2016) [1]
ExAC.r0.3.processed.RData             		% All the variants present in ExAC version r0.3, pre-processed in R
half_life.txt					% mRNA half-life from Sharova et al., 2009 [2]
MGI15_short.txt					% data from gene expression database (GXT) regarding mouse embrionic expression
MGIno_short.txt					% data from gene expression database (GXT) regarding mouse embrionic expression
NMD.tsv						% NMD targets from meta-analysis of Colombo et al., 2017 [3]
refGene.promotor.PhyloP.txt			% PhyloP scores from -500 to +500 around tss (PhyloP46way.vertebrates) 
training_set_v1.tsv				% Training set 
validation.tsv					% Validation set

(4) USAGE
----------

After you have dowloaded all the required R packages

1. Dowload the the DOMINO_v1.tar.gz from https://wwwfbm.unil.ch/domino/download.html 
2. Create directory for DOMINO
	$ mkdir DOMINO_v1
3. Unzip the files and navigate to the DOMINO_v1 directory.
	$ tar -xf DOMINO_v1.tar.gz --directory ./DOMINO_v1
	$ cd DOMINO_v1
4. Open DOMINO_script_v1.R in Rstudio
5. Change into the DOMINO directory on line 10
	> work_directory="/home/DOMINO_v1" # change to your domino working directory
6. Run the script
7. The script will create four files:
	a) results.tsv				% Table of all the autosomal genes with P(AD) scores and selected features values
	b) final_resutls.RData			% Saved workspace of the workflow
	d) results_graphics.pdf			% All the figures

Note: the alghoritm is set to 15th iteration, you can change this setting on line 1085.
	> final_formula=form_it[15] # change if needed

-----------------------------------------------------------------------------------

References:
[1] Samocha, K.E., Robinson, E.B., Sanders, S.J., Stevens, C., Sabo, A., McGrath, L.M., Kosmicki, J.A., Rehnstrom, K., Mallick, S., Kirby, A., et al. (2014). A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944-950.
[2] Sharova, L.V., Sharov, A.A., Nedorezov, T., Piao, Y., Shaik, N., and Ko, M.S. (2009). Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells. DNA Res. 16, 45-58.
[3] Colombo, M., Karousis, E.D., Bourquin, J., Bruggmann, R., and Muhlemann, O. (2017). Transcriptome-wide identification of NMD-targeted human mRNAs reveals extensive redundancy between SMG6- and SMG7-mediated degradation pathways. RNA 23, 189-201.