Charting disease gene associations through propagation of disease phenotypic similarities

Network medicine approaches have been largely successful at increasing our knowledge of molecularly characterized diseases. Given a set of disease genes associated with a disease, neighbourhood-based methods and random walkers exploit the interactome allowing the prediction of further genes for that disease. In general, however, diseases with no known molecular basis constitute a challenge. Here we present a novel network approach to prioritise gene-disease associations that is able to also predict genes for diseases with no known molecular basis. Our method, which we have called Cardigan (ChARting DIsease Gene AssociatioNs), uses semi-supervised learning and exploits a measure of similarity between disease phenotypes. We evaluated its performance at predicting genes for both molecularly characterized and uncharacterized diseases, using both weighted and binary interactomes, and compared it with state-of-the-art methods. Our tests, which use datasets collected at different points in time to replicate the dynamics of the disease gene discovery process, prove that Cardigan is able to accurately predict disease genes for molecularly uncharacterized diseases. Additionally, standard leave-one-out cross validation tests show how our approach outperforms state-of-the-art methods at predicting genes for molecularly characterized diseases by 55%-65%. Cardigan can also be used for disease module prediction, where it outperforms state-of-the-art methods by 17%-72%.




Cardigan suite and usage examples. The software directly handles OMIM databases, and the PPI networks presented in the Cardigan paper.

Code + Data Bundle
We include the Cardigan code with the 2017 Caniza similarity (as used in our publication).



Caniza Similarity (2017)
Similarity data computed with the 2017 OMIM database.



Supplementary Material

Materials and Methods
Detailed definitions, extended experimental results, and software usage examples

Data file 1
OMIM identifiers for the DIAMOnD disease modules

Data file 2
Cardigan predictions for the 2017 OMIM on HPRD (TSV format)

Graph Sources

All predictions Compressed file containing all the predictions shown in the paper and the supplementary material.


This application was developed by Juan Caceres.

Follow Us

Our group activities are non-regularly being uploaded to our Facebook page

Contact Us

Department of Computer Science
Royal Holloway, University of London
Egham, Surrey, United Kingdom TW20 0EX

Call us:+44 (1784) 414239