CEDAL: Time-E-icient detection of erroneous links in large-scale link repositories

André Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

More than 500 million facts on the Linked DataWeb are statements across knowledge bases. These links are of crucial importance for the Linked Data Web as they make a large number of tasks possible, including cross-ontology, question answering and federated queries. However, a large number of these links are erroneous and can thus lead to these applications producing absurd results. We present a time-efficient and complete approach for the detection of erroneous links for properties that are transitive. To this end, we make use of the semantics of URIs on the Data Web and combine it with an efficient graph partitioning algorithm. We then apply our algorithm to the LinkLion repository and show that we can analyze 19,200,114 links in 4.6 minutes. Our results show that at least 13% of the owl:sameAs links we considered are erroneous. In addition, our analysis of the provenance of links allows discovering agents and knowledge bases that commonly display poor linking. Our algorithm can be easily executed in parallel and on a GPU. We show that these implementations are up to two orders of magnitude faster than classical reasoners and a non-parallel implementation.
Original languageEnglish
Title of host publicationProceedings - 2017 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017
PublisherAssociation for Computing Machinery, Inc
Pages106-113
ISBN (Electronic)9781450349512
DOIs
Publication statusPublished - 23 Aug 2017
Externally publishedYes
Event16th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017 - Leipzig, Germany
Duration: 23 Aug 201726 Aug 2017

Conference

Conference16th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017
Country/TerritoryGermany
CityLeipzig
Period23/08/1726/08/17

Funding

This research has been partially supported by CNPq Brazil under grants No. 201536/2014-5 and H2020 projects SLIPO (GA no. 731581) and HOBBIT (GA no. 688227) as well as the DFG project LinkingLOD (project no. NG 105/3-2), the BMWI Projects SAKE (project no. 01MD15006E) and GEISER (project no. 01MD16014). Thanks to Matthias Wauer for the proofreading.

FundersFunder number
GEISER01MD16014
Horizon 2020 Framework Programme688227, 731581
Deutsche ForschungsgemeinschaftNG 105/3-2
Bundesministerium für Wirtschaft und Technologie01MD15006E
Conselho Nacional de Desenvolvimento Científico e Tecnológico201536/2014-5

    Fingerprint

    Dive into the research topics of 'CEDAL: Time-E-icient detection of erroneous links in large-scale link repositories'. Together they form a unique fingerprint.

    Cite this