Description
BioKG is a biomedical knowledge graph containing relationships between proteins, molecules, diseases, and others. It was originally proposed by Walsh et al. (2020) in "BioKG: A Knowledge Graph for Relational Learning On Biological Data". We enrich this dataset with the aim of incorporating multimodal data associated with biomedical entities: Proteins: Protein embeddings computed with ProtTrans from aminoacid sequences Molecules: Molecule embeddings computed with MolTrans from SMILES representations Diseases: Textual descriptions retrieved from MeSH Furthermore, we decouple the benchmarks provided by Walsh et al. from the edges in the knowledge graph, which ensures that there is no direct data leakage between the benchmarks and the triples used to train link prediction models.
Date made available | 2023 |
---|---|
Publisher | Zenodo |