Closer Reading of RDF Generated by NLP on Wikipedia Biography: Comparative Analysis

Go Sugimoto*, Angel Daza, Victor de Boer

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Although Wikidata and DBpedia are closely related to Wikipedia, they often hold a small subset of its semantic information, due to the specific scopes and methodologies chosen for their Linked Data (LD) construction. When we look at biographies, a large amount of RDF statements focus on the person’s core facts, and many semantic narratives are not included. To fill this knowledge gap, this paper seeks a solution with Natural Language Processing (NLP). We aim to assess to what extent out-of-the-box NLP tools can generate new LD from the biographical articles in Wikipedia. Unlike other NLP research, we put more emphasis on the qualitative analysis of NLP outputs (“close reading”) than the statistical performance of NLP algorithms (“distant reading”). We evaluate the overlaps and gaps between Wikipedia, Wikidata and DBpedia, as well as other biographical ontologies. We also analyze the triple patterns from the NLP results in comparison with the RDF entity (instance) and ontologies. Our research revealed that we are able to capture new information about the entity that Wikidata and DBpedia do not hold. At the same time, some noise could not be easily eliminated. Our method presented a bottom-up approach to biographical ontology designing. We also briefly propose a possible solution for future work.

Original languageEnglish
Title of host publicationMetadata and Semantic Research
Subtitle of host publication17th Research Conference, MTSR 2023, Milan, Italy, October 25–27, 2023, Revised Selected Papers
EditorsEmmanouel Garoufallou, Fabio Sartori
PublisherSpringer Science and Business Media Deutschland GmbH
Pages41-54
Number of pages14
ISBN (Electronic)9783031659904
ISBN (Print)9783031659898
DOIs
Publication statusPublished - 2024
Event17th Research Conference on Metadata and Semantic Research, MTSR 2023 - Milan, Italy
Duration: 25 Oct 202327 Oct 2023

Publication series

NameCommunications in Computer and Information Science
Volume2048 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937
NameMTSR: Research Conference on Metadata and Semantics Research
PublisherSpringer
Volume2023

Conference

Conference17th Research Conference on Metadata and Semantic Research, MTSR 2023
Country/TerritoryItaly
CityMilan
Period25/10/2327/10/23

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Keywords

  • Biographies
  • DBpedia
  • Henry VIII
  • Knowledge Graph
  • Linked Data
  • Natural Language Processing
  • Wikidata
  • Wikipedia

Fingerprint

Dive into the research topics of 'Closer Reading of RDF Generated by NLP on Wikipedia Biography: Comparative Analysis'. Together they form a unique fingerprint.

Cite this