Detecting erroneous identity links on the web using network metrics

Joe Raad, Wouter Beek, Frank van Harmelen, Nathalie Pernelle, Fatiha Saïs

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

In the absence of a central naming authority on the Semantic Web, it is common for different datasets to refer to the same thing by different IRIs. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Studies that date back as far as 2009, have observed that the owl:sameAs property is sometimes used incorrectly. In this paper, we show how network metrics such as the community structure of the owl:sameAs graph can be used in order to detect such possibly erroneous statements. One benefit of the here presented approach is that it can be applied to the network of owl:sameAs links itself, and does not rely on any additional knowledge. In order to illustrate its ability to scale, the approach is evaluated on the largest collection of identity links to date, containing over 558M owl:sameAs links scraped from the LOD Cloud.

LanguageEnglish
Title of host publicationThe Semantic Web – ISWC 2018 - 17th International Semantic Web Conference, 2018, Proceedings
EditorsMari Carmen Suárez-Figueroa, Valentina Presutti, Lucie-Aimee Kaffee, Elena Simperl, Marta Sabou, Denny Vrandecic, Irene Celino, Kalina Bontcheva
PublisherSpringer/Verlag
Pages391-407
Number of pages17
ISBN (Print)9783030006709
DOIs
Publication statusPublished - 2018
Event17th International Semantic Web Conference, ISWC 2018 - Monterey, United States
Duration: 8 Oct 201812 Oct 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11136 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Semantic Web Conference, ISWC 2018
CountryUnited States
CityMonterey
Period8/10/1812/10/18

Fingerprint

Semantic Web
Thing
Metric
Community Structure
Date
Reuse
Denote
Graph in graph theory
Knowledge

Keywords

  • Communities
  • Identity
  • Linked Open Data
  • Owl:sameAs

Cite this

Raad, J., Beek, W., van Harmelen, F., Pernelle, N., & Saïs, F. (2018). Detecting erroneous identity links on the web using network metrics. In M. C. Suárez-Figueroa, V. Presutti, L-A. Kaffee, E. Simperl, M. Sabou, D. Vrandecic, I. Celino, ... K. Bontcheva (Eds.), The Semantic Web – ISWC 2018 - 17th International Semantic Web Conference, 2018, Proceedings (pp. 391-407). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11136 LNCS). Springer/Verlag. https://doi.org/10.1007/978-3-030-00671-6_23
Raad, Joe ; Beek, Wouter ; van Harmelen, Frank ; Pernelle, Nathalie ; Saïs, Fatiha. / Detecting erroneous identity links on the web using network metrics. The Semantic Web – ISWC 2018 - 17th International Semantic Web Conference, 2018, Proceedings. editor / Mari Carmen Suárez-Figueroa ; Valentina Presutti ; Lucie-Aimee Kaffee ; Elena Simperl ; Marta Sabou ; Denny Vrandecic ; Irene Celino ; Kalina Bontcheva. Springer/Verlag, 2018. pp. 391-407 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{ea6da796abab4d0ba0ed331bf883241d,
title = "Detecting erroneous identity links on the web using network metrics",
abstract = "In the absence of a central naming authority on the Semantic Web, it is common for different datasets to refer to the same thing by different IRIs. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Studies that date back as far as 2009, have observed that the owl:sameAs property is sometimes used incorrectly. In this paper, we show how network metrics such as the community structure of the owl:sameAs graph can be used in order to detect such possibly erroneous statements. One benefit of the here presented approach is that it can be applied to the network of owl:sameAs links itself, and does not rely on any additional knowledge. In order to illustrate its ability to scale, the approach is evaluated on the largest collection of identity links to date, containing over 558M owl:sameAs links scraped from the LOD Cloud.",
keywords = "Communities, Identity, Linked Open Data, Owl:sameAs",
author = "Joe Raad and Wouter Beek and {van Harmelen}, Frank and Nathalie Pernelle and Fatiha Sa{\"i}s",
year = "2018",
doi = "10.1007/978-3-030-00671-6_23",
language = "English",
isbn = "9783030006709",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer/Verlag",
pages = "391--407",
editor = "Su{\'a}rez-Figueroa, {Mari Carmen} and Valentina Presutti and Lucie-Aimee Kaffee and Elena Simperl and Marta Sabou and Denny Vrandecic and Irene Celino and Kalina Bontcheva",
booktitle = "The Semantic Web – ISWC 2018 - 17th International Semantic Web Conference, 2018, Proceedings",

}

Raad, J, Beek, W, van Harmelen, F, Pernelle, N & Saïs, F 2018, Detecting erroneous identity links on the web using network metrics. in MC Suárez-Figueroa, V Presutti, L-A Kaffee, E Simperl, M Sabou, D Vrandecic, I Celino & K Bontcheva (eds), The Semantic Web – ISWC 2018 - 17th International Semantic Web Conference, 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11136 LNCS, Springer/Verlag, pp. 391-407, 17th International Semantic Web Conference, ISWC 2018, Monterey, United States, 8/10/18. https://doi.org/10.1007/978-3-030-00671-6_23

Detecting erroneous identity links on the web using network metrics. / Raad, Joe; Beek, Wouter; van Harmelen, Frank; Pernelle, Nathalie; Saïs, Fatiha.

The Semantic Web – ISWC 2018 - 17th International Semantic Web Conference, 2018, Proceedings. ed. / Mari Carmen Suárez-Figueroa; Valentina Presutti; Lucie-Aimee Kaffee; Elena Simperl; Marta Sabou; Denny Vrandecic; Irene Celino; Kalina Bontcheva. Springer/Verlag, 2018. p. 391-407 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11136 LNCS).

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Detecting erroneous identity links on the web using network metrics

AU - Raad, Joe

AU - Beek, Wouter

AU - van Harmelen, Frank

AU - Pernelle, Nathalie

AU - Saïs, Fatiha

PY - 2018

Y1 - 2018

N2 - In the absence of a central naming authority on the Semantic Web, it is common for different datasets to refer to the same thing by different IRIs. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Studies that date back as far as 2009, have observed that the owl:sameAs property is sometimes used incorrectly. In this paper, we show how network metrics such as the community structure of the owl:sameAs graph can be used in order to detect such possibly erroneous statements. One benefit of the here presented approach is that it can be applied to the network of owl:sameAs links itself, and does not rely on any additional knowledge. In order to illustrate its ability to scale, the approach is evaluated on the largest collection of identity links to date, containing over 558M owl:sameAs links scraped from the LOD Cloud.

AB - In the absence of a central naming authority on the Semantic Web, it is common for different datasets to refer to the same thing by different IRIs. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Studies that date back as far as 2009, have observed that the owl:sameAs property is sometimes used incorrectly. In this paper, we show how network metrics such as the community structure of the owl:sameAs graph can be used in order to detect such possibly erroneous statements. One benefit of the here presented approach is that it can be applied to the network of owl:sameAs links itself, and does not rely on any additional knowledge. In order to illustrate its ability to scale, the approach is evaluated on the largest collection of identity links to date, containing over 558M owl:sameAs links scraped from the LOD Cloud.

KW - Communities

KW - Identity

KW - Linked Open Data

KW - Owl:sameAs

UR - http://www.scopus.com/inward/record.url?scp=85054815258&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054815258&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-00671-6_23

DO - 10.1007/978-3-030-00671-6_23

M3 - Conference contribution

SN - 9783030006709

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 391

EP - 407

BT - The Semantic Web – ISWC 2018 - 17th International Semantic Web Conference, 2018, Proceedings

A2 - Suárez-Figueroa, Mari Carmen

A2 - Presutti, Valentina

A2 - Kaffee, Lucie-Aimee

A2 - Simperl, Elena

A2 - Sabou, Marta

A2 - Vrandecic, Denny

A2 - Celino, Irene

A2 - Bontcheva, Kalina

PB - Springer/Verlag

ER -

Raad J, Beek W, van Harmelen F, Pernelle N, Saïs F. Detecting erroneous identity links on the web using network metrics. In Suárez-Figueroa MC, Presutti V, Kaffee L-A, Simperl E, Sabou M, Vrandecic D, Celino I, Bontcheva K, editors, The Semantic Web – ISWC 2018 - 17th International Semantic Web Conference, 2018, Proceedings. Springer/Verlag. 2018. p. 391-407. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-00671-6_23