How Contentious Terms About People and Cultures are Used in Linked Open Data

Andrei Nesterov, Laura Hollink, Jacco Van Ossenbruggen

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Web resources in linked open data (LOD) are comprehensible to humans through literal textual values attached to them, such as labels, notes, or comments. Word choices in literals may not always be neutral. When culturally stereotyping terminology is used in literals, they may appear as offensive to users in interfaces and propagate stereotypes to algorithms trained on them. We study how frequently and in which literals contentious terms about people and cultures occur in LOD and whether there are attempts to mark the usage of such terms. For our analysis, we reuse English and Dutch terms from a knowledge graph that provides opinions of experts from the cultural heritage domain about terms' contentiousness. We inspect occurrences of these terms in four widely used datasets: Wikidata, The Getty Art & Architecture Thesaurus, Princeton WordNet, and Open Dutch WordNet. Some terms are ambiguous and contentious only in particular senses. Applying word sense disambiguation, we generate a set of literals relevant to our analysis. We found that contentious terms frequently appear in descriptive and labelling literals, such as preferred labels that are usually displayed in interfaces and used for indexing. In some cases, LOD contributors mark contentious terms with words and phrases in literals (implicit markers) or properties linked to resources (explicit markers). However, such marking is rare and non-consistent in all datasets. Our quantitative and qualitative insights could be helpful in developing more systematic approaches to address the propagation of stereotypes via LOD.

Original languageEnglish
Title of host publicationWWW 2024
Subtitle of host publicationProceedings of the ACM Web Conference 2024
PublisherAssociation for Computing Machinery, Inc
Pages4523-4533
Number of pages11
ISBN (Electronic)9798400701719
DOIs
Publication statusPublished - 2024
Event33rd ACM Web Conference, WWW 2024 - Singapore, Singapore
Duration: 13 May 202417 May 2024

Conference

Conference33rd ACM Web Conference, WWW 2024
Country/TerritorySingapore
CitySingapore
Period13/05/2417/05/24

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

Keywords

  • contentious terms
  • linked open data
  • literals
  • stereotypes

Fingerprint

Dive into the research topics of 'How Contentious Terms About People and Cultures are Used in Linked Open Data'. Together they form a unique fingerprint.

Cite this