TY - JOUR
T1 - Publishing DisGeNET as nanopublications
AU - Queralt-Rosinach, Núria
AU - Kuhn, Tobias
AU - Chichester, Christine
AU - Dumontier, Michel
AU - Sanz, Ferran
AU - Furlong, Laura I.
PY - 2016/6/23
Y1 - 2016/6/23
N2 - The increasing and unprecedented publication rate in the biomedical field is a major bottleneck for knowledge discovery in the Life Sciences. The manual curation of facts from published scientific papers is slow and inefficient, and therefore new approaches are needed that can enable the automatic, scalable and reliable extraction of assertions. While the publication of scientific assertions and datasets on the SemanticWeb is gaining traction, it also creates new challenges such as the proper representation of provenance and versioning. Here, we address these issues and describe our efforts to represent the DisGeNET database of human gene-disease associations as permanent, immutable, and provenance rich digital objects called nanopublications. Our nanopublications are the first instance of a Linked Data model that ensures stable interlinking of the assertion and its metadata by Trusty URIs. As DisGeNET integrates manually curated as well as text-mined data of different origins, the semantic description of the evidence for each assertion is important to provide trust and allow evidence-based hypothesis generation. Here, we describe our steps to ensure high quality and demonstrate the utility of linking our data to other datasets on the emerging Semantic Web.
AB - The increasing and unprecedented publication rate in the biomedical field is a major bottleneck for knowledge discovery in the Life Sciences. The manual curation of facts from published scientific papers is slow and inefficient, and therefore new approaches are needed that can enable the automatic, scalable and reliable extraction of assertions. While the publication of scientific assertions and datasets on the SemanticWeb is gaining traction, it also creates new challenges such as the proper representation of provenance and versioning. Here, we address these issues and describe our efforts to represent the DisGeNET database of human gene-disease associations as permanent, immutable, and provenance rich digital objects called nanopublications. Our nanopublications are the first instance of a Linked Data model that ensures stable interlinking of the assertion and its metadata by Trusty URIs. As DisGeNET integrates manually curated as well as text-mined data of different origins, the semantic description of the evidence for each assertion is important to provide trust and allow evidence-based hypothesis generation. Here, we describe our steps to ensure high quality and demonstrate the utility of linking our data to other datasets on the emerging Semantic Web.
KW - Gene-disease associations
KW - Linked data
KW - Nanopublication
KW - Provenance
KW - Trusty URIs
UR - http://www.scopus.com/inward/record.url?scp=84976465112&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84976465112&partnerID=8YFLogxK
U2 - 10.3233/SW-150189
DO - 10.3233/SW-150189
M3 - Article
AN - SCOPUS:84976465112
SN - 1570-0844
VL - 7
SP - 519
EP - 528
JO - Semantic Web
JF - Semantic Web
IS - 5
ER -