Abstract
More than 500 million facts on the Linked DataWeb are statements across knowledge bases. These links are of crucial importance for the Linked Data Web as they make a large number of tasks possible, including cross-ontology, question answering and federated queries. However, a large number of these links are erroneous and can thus lead to these applications producing absurd results. We present a time-efficient and complete approach for the detection of erroneous links for properties that are transitive. To this end, we make use of the semantics of URIs on the Data Web and combine it with an efficient graph partitioning algorithm. We then apply our algorithm to the LinkLion repository and show that we can analyze 19,200,114 links in 4.6 minutes. Our results show that at least 13% of the owl:sameAs links we considered are erroneous. In addition, our analysis of the provenance of links allows discovering agents and knowledge bases that commonly display poor linking. Our algorithm can be easily executed in parallel and on a GPU. We show that these implementations are up to two orders of magnitude faster than classical reasoners and a non-parallel implementation.
Original language | English |
---|---|
Title of host publication | Proceedings - 2017 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017 |
Publisher | Association for Computing Machinery, Inc |
Pages | 106-113 |
ISBN (Electronic) | 9781450349512 |
DOIs | |
Publication status | Published - 23 Aug 2017 |
Externally published | Yes |
Event | 16th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017 - Leipzig, Germany Duration: 23 Aug 2017 → 26 Aug 2017 |
Conference
Conference | 16th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017 |
---|---|
Country/Territory | Germany |
City | Leipzig |
Period | 23/08/17 → 26/08/17 |
Funding
This research has been partially supported by CNPq Brazil under grants No. 201536/2014-5 and H2020 projects SLIPO (GA no. 731581) and HOBBIT (GA no. 688227) as well as the DFG project LinkingLOD (project no. NG 105/3-2), the BMWI Projects SAKE (project no. 01MD15006E) and GEISER (project no. 01MD16014). Thanks to Matthias Wauer for the proofreading.
Funders | Funder number |
---|---|
GEISER | 01MD16014 |
Horizon 2020 Framework Programme | 688227, 731581 |
Deutsche Forschungsgemeinschaft | NG 105/3-2 |
Bundesministerium für Wirtschaft und Technologie | 01MD15006E |
Conselho Nacional de Desenvolvimento Científico e Tecnológico | 201536/2014-5 |