Abstract
Background: Understanding the impact of gene interactions on disease phenotypes is increasingly recognised as a crucial aspect of genetic disease research. This trend is reflected by the growing amount of clinical research on oligogenic diseases, where disease manifestations are influenced by combinations of variants on a few specific genes. Although statistical machine-learning methods have been developed to identify relevant genetic variant or gene combinations associated with oligogenic diseases, they rely on abstract features and black-box models, posing challenges to interpretability for medical experts and impeding their ability to comprehend and validate predictions. In this work, we present a novel, interpretable predictive approach based on a knowledge graph that not only provides accurate predictions of disease-causing gene interactions but also offers explanations for these results. Results: We introduce BOCK, a knowledge graph constructed to explore disease-causing genetic interactions, integrating curated information on oligogenic diseases from clinical cases with relevant biomedical networks and ontologies. Using this graph, we developed a novel predictive framework based on heterogenous paths connecting gene pairs. This method trains an interpretable decision set model that not only accurately predicts pathogenic gene interactions, but also unveils the patterns associated with these diseases. A unique aspect of our approach is its ability to offer, along with each positive prediction, explanations in the form of subgraphs, revealing the specific entities and relationships that led to each pathogenic prediction. Conclusion: Our method, built with interpretability in mind, leverages heterogenous path information in knowledge graphs to predict pathogenic gene interactions and generate meaningful explanations. This not only broadens our understanding of the molecular mechanisms underlying oligogenic diseases, but also presents a novel application of knowledge graphs in creating more transparent and insightful predictors for genetic research.
Original language | English |
---|---|
Article number | 324 |
Pages (from-to) | 1-25 |
Number of pages | 25 |
Journal | BMC Bioinformatics |
Volume | 24 |
Early online date | 29 Aug 2023 |
DOIs | |
Publication status | Published - 2023 |
Bibliographical note
Funding Information:The authors thank Barbara Gravel, Charlotte Nachtegael, Sofia Papadimitriou, Emma Verkinderen and Nassim Versbraegen who provided insightful discussions about oligogenic information extracted from OLIDA and helpful feedbacks. The authors also thank Foundation 101 Genomes (f101g.org) for fruitful collaboration, creative exchange, and scientific support.
Funding Information:
This work was supported by the European Regional Development Fund (ERDF) and the Brussels-Capital Region-Innoviris within the framework of the Operational Programme 2014–2020 through the ERDF-2020 project ICITY-RDI.BRU [27.002.53.01.4524 to A.R., A.N., T.L.], an F.N.R.S-F.R.S CDR [35276964 to T.L.], an Innoviris Joint R &D project Genome4Brussels [2020 RDIR 55b to A.R., T.L.], a Research Foundation-Flanders (F.W.O.) Infrastructure project associated with ELIXIR Belgium [I002819N to T.L.], the Imagica2 IRP project by the Vrije Universiteit Brussel [IRP8b to A.N.], by the Graph-Massivizer project, funded by the Horizon Europe programme of the European Union [101093202 to M.C.] and by TAILOR, a project funded by the EU Horizon 2020 research and innovation program [952215 to T.L. and A.N.].
Publisher Copyright:
© 2023, BioMed Central Ltd., part of Springer Nature.
Keywords
- Disease genetics
- Genetic interactions
- Interpretable machine-learning
- Knowledge graphs