Publishing DisGeNET as nanopublications

Núria Queralt-Rosinach, Tobias Kuhn, Christine Chichester, Michel Dumontier, Ferran Sanz, Laura I. Furlong

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

The increasing and unprecedented publication rate in the biomedical field is a major bottleneck for knowledge discovery in the Life Sciences. The manual curation of facts from published scientific papers is slow and inefficient, and therefore new approaches are needed that can enable the automatic, scalable and reliable extraction of assertions. While the publication of scientific assertions and datasets on the SemanticWeb is gaining traction, it also creates new challenges such as the proper representation of provenance and versioning. Here, we address these issues and describe our efforts to represent the DisGeNET database of human gene-disease associations as permanent, immutable, and provenance rich digital objects called nanopublications. Our nanopublications are the first instance of a Linked Data model that ensures stable interlinking of the assertion and its metadata by Trusty URIs. As DisGeNET integrates manually curated as well as text-mined data of different origins, the semantic description of the evidence for each assertion is important to provide trust and allow evidence-based hypothesis generation. Here, we describe our steps to ensure high quality and demonstrate the utility of linking our data to other datasets on the emerging Semantic Web.

Original languageEnglish
Pages (from-to)519-528
Number of pages10
JournalSemantic Web
Volume7
Issue number5
DOIs
Publication statusPublished - 23 Jun 2016

Fingerprint

Semantics
Metadata
Data mining
Data structures
Genes

Keywords

  • Gene-disease associations
  • Linked data
  • Nanopublication
  • Provenance
  • Trusty URIs

Cite this

Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., & Furlong, L. I. (2016). Publishing DisGeNET as nanopublications. Semantic Web, 7(5), 519-528. https://doi.org/10.3233/SW-150189
Queralt-Rosinach, Núria ; Kuhn, Tobias ; Chichester, Christine ; Dumontier, Michel ; Sanz, Ferran ; Furlong, Laura I. / Publishing DisGeNET as nanopublications. In: Semantic Web. 2016 ; Vol. 7, No. 5. pp. 519-528.
@article{c7b4014b1ceb4e2d948033cee28c20c1,
title = "Publishing DisGeNET as nanopublications",
abstract = "The increasing and unprecedented publication rate in the biomedical field is a major bottleneck for knowledge discovery in the Life Sciences. The manual curation of facts from published scientific papers is slow and inefficient, and therefore new approaches are needed that can enable the automatic, scalable and reliable extraction of assertions. While the publication of scientific assertions and datasets on the SemanticWeb is gaining traction, it also creates new challenges such as the proper representation of provenance and versioning. Here, we address these issues and describe our efforts to represent the DisGeNET database of human gene-disease associations as permanent, immutable, and provenance rich digital objects called nanopublications. Our nanopublications are the first instance of a Linked Data model that ensures stable interlinking of the assertion and its metadata by Trusty URIs. As DisGeNET integrates manually curated as well as text-mined data of different origins, the semantic description of the evidence for each assertion is important to provide trust and allow evidence-based hypothesis generation. Here, we describe our steps to ensure high quality and demonstrate the utility of linking our data to other datasets on the emerging Semantic Web.",
keywords = "Gene-disease associations, Linked data, Nanopublication, Provenance, Trusty URIs",
author = "N{\'u}ria Queralt-Rosinach and Tobias Kuhn and Christine Chichester and Michel Dumontier and Ferran Sanz and Furlong, {Laura I.}",
year = "2016",
month = "6",
day = "23",
doi = "10.3233/SW-150189",
language = "English",
volume = "7",
pages = "519--528",
journal = "Semantic Web",
issn = "1570-0844",
publisher = "IOS Press",
number = "5",

}

Queralt-Rosinach, N, Kuhn, T, Chichester, C, Dumontier, M, Sanz, F & Furlong, LI 2016, 'Publishing DisGeNET as nanopublications' Semantic Web, vol. 7, no. 5, pp. 519-528. https://doi.org/10.3233/SW-150189

Publishing DisGeNET as nanopublications. / Queralt-Rosinach, Núria; Kuhn, Tobias; Chichester, Christine; Dumontier, Michel; Sanz, Ferran; Furlong, Laura I.

In: Semantic Web, Vol. 7, No. 5, 23.06.2016, p. 519-528.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Publishing DisGeNET as nanopublications

AU - Queralt-Rosinach, Núria

AU - Kuhn, Tobias

AU - Chichester, Christine

AU - Dumontier, Michel

AU - Sanz, Ferran

AU - Furlong, Laura I.

PY - 2016/6/23

Y1 - 2016/6/23

N2 - The increasing and unprecedented publication rate in the biomedical field is a major bottleneck for knowledge discovery in the Life Sciences. The manual curation of facts from published scientific papers is slow and inefficient, and therefore new approaches are needed that can enable the automatic, scalable and reliable extraction of assertions. While the publication of scientific assertions and datasets on the SemanticWeb is gaining traction, it also creates new challenges such as the proper representation of provenance and versioning. Here, we address these issues and describe our efforts to represent the DisGeNET database of human gene-disease associations as permanent, immutable, and provenance rich digital objects called nanopublications. Our nanopublications are the first instance of a Linked Data model that ensures stable interlinking of the assertion and its metadata by Trusty URIs. As DisGeNET integrates manually curated as well as text-mined data of different origins, the semantic description of the evidence for each assertion is important to provide trust and allow evidence-based hypothesis generation. Here, we describe our steps to ensure high quality and demonstrate the utility of linking our data to other datasets on the emerging Semantic Web.

AB - The increasing and unprecedented publication rate in the biomedical field is a major bottleneck for knowledge discovery in the Life Sciences. The manual curation of facts from published scientific papers is slow and inefficient, and therefore new approaches are needed that can enable the automatic, scalable and reliable extraction of assertions. While the publication of scientific assertions and datasets on the SemanticWeb is gaining traction, it also creates new challenges such as the proper representation of provenance and versioning. Here, we address these issues and describe our efforts to represent the DisGeNET database of human gene-disease associations as permanent, immutable, and provenance rich digital objects called nanopublications. Our nanopublications are the first instance of a Linked Data model that ensures stable interlinking of the assertion and its metadata by Trusty URIs. As DisGeNET integrates manually curated as well as text-mined data of different origins, the semantic description of the evidence for each assertion is important to provide trust and allow evidence-based hypothesis generation. Here, we describe our steps to ensure high quality and demonstrate the utility of linking our data to other datasets on the emerging Semantic Web.

KW - Gene-disease associations

KW - Linked data

KW - Nanopublication

KW - Provenance

KW - Trusty URIs

UR - http://www.scopus.com/inward/record.url?scp=84976465112&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84976465112&partnerID=8YFLogxK

U2 - 10.3233/SW-150189

DO - 10.3233/SW-150189

M3 - Article

VL - 7

SP - 519

EP - 528

JO - Semantic Web

JF - Semantic Web

SN - 1570-0844

IS - 5

ER -

Queralt-Rosinach N, Kuhn T, Chichester C, Dumontier M, Sanz F, Furlong LI. Publishing DisGeNET as nanopublications. Semantic Web. 2016 Jun 23;7(5):519-528. https://doi.org/10.3233/SW-150189