Literally better: Analyzing and improving the quality of literals

Wouter Beek*, Filip Ilievski, Jeremy Debattista, Stefan Schlobach, Jan Wielemaker

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

191 Downloads (Pure)

Abstract

Quality is a complicated and multifarious topic in contemporary Linked Data research. The aspect of literal quality in particular has not yet been rigorously studied. Nevertheless, analyzing and improving the quality of literals is important since literals form a substantial (one in seven statements) and crucial part of the Semantic Web. Specifically, literals allow infinite value spaces to be expressed and they provide the linguistic entry point to the LOD Cloud. We present a toolchain that builds on the LOD Laundromat data cleaning and republishing infrastructure and that allows us to analyze the quality of literals on a very large scale, using a collection of quality criteria we specify in a systematic way. We illustrate the viability of our approach by lifting out two particular aspects in which the current LOD Cloud can be immediately improved by automated means: value canonization and language tagging. Since not all quality aspects can be addressed algorithmically, we also give an overview of other problems that can be used to guide future endeavors in tooling, training, and best practice formulation.

Original languageEnglish
Pages (from-to)131-150
Number of pages20
JournalSemantic Web
Volume9
Issue number1
Early online date30 Nov 2017
DOIs
Publication statusPublished - 2018

Keywords

  • data observatory
  • Data quality
  • linked data
  • quality assessment
  • quality improvement

Fingerprint

Dive into the research topics of 'Literally better: Analyzing and improving the quality of literals'. Together they form a unique fingerprint.

Cite this