What is Keygenes?
KeyGenes is an algorithm to predict the identity and determines identity scores of queried samples (test set) to a provided group of samples (training set). It uses transcriptional profiles of the queried data (test set) and matches them to sets of transcriptional profiles of organs or cell types (training set). KeyGenes uses a 10-fold cross validation on the basis of a LASSO (Least Absolute Shrinkage and Selection Operator) regression available in the R package “glmnet” (Friedman et al., 2010).
Information about the different “fixed” training sets provided as a headstart as well as the instructions how to use either the Web App on “fixed” training sets or the R scripts on “fixed” or “flexible” training sets can be found on http://www.keygenes.nl/ (“How to use KeyGenes”). The R scripts and the different available “fixed” training sets (with associated files), can be downloaded from http://www.keygenes.nl/.
What do I get from KeyGenes?
The output you will get from KeyGenes consists of four files:
1. A PDF file (KeyGenes_Heatmap.pdf) with a heatmap containing the identity scores (between 0 and 1) of your samples matched to the samples included in the training set.
2. A text file (KeyGenes_Matrix.txt) containing a matrix with the identity scores (between 0 and 1) of the queried samples matched to the samples included in the training set.
3. A text file (KeyGenes_Prediction.txt) with the queried samples and the sample in the training set with the highest identity score.
4. A text file (KeyGenes_Classifier.txt) containing the list of classifier genes per sample calculated from the training set used to determine the identity scores (between 0 and 1) of the queried samples matched to the samples included in the training set.
Journal of statistical software. 33(1): 1-22.