Contextualized Word Embeddings Expose Ethnic Biases in News

Guusje Thijs, Damian Trilling, Anne C. Kroon

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

The web is a major source for news and information. Yet, news can perpetuate and amplify biases and stereotypes. Prior work has shown that training static word embeddings can expose such biases. In this short paper, we apply both a conventional Word2Vec approach as well as a more modern BERT-based approach to a large corpus of Dutch news. We demonstrate that both methods expose ethnic biases in the news corpus. We also show that the biases in the news corpus are considerably stronger than the biases in the transformer model itself.

Original languageEnglish
Title of host publicationWEBSCI '24
Subtitle of host publicationProceedings of the 16th ACM Web Science Conference
EditorsLuca Maria Aiello, Yelena Mejova, Oshani Seneviratne, Jun Sun, Sierra Kaiser, Steffen Staab
PublisherAssociation for Computing Machinery, Inc
Pages290-295
Number of pages6
ISBN (Electronic)9798400703348
DOIs
Publication statusPublished - 2024
Event16th ACM Web Science Conference, WebSci 2024 - Stuttgart, Germany
Duration: 21 May 202424 May 2024

Conference

Conference16th ACM Web Science Conference, WebSci 2024
Country/TerritoryGermany
CityStuttgart
Period21/05/2424/05/24

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s)

Keywords

  • bias
  • contextualized word embeddings
  • news
  • static word embeddings
  • stereotypes
  • transformer

Fingerprint

Dive into the research topics of 'Contextualized Word Embeddings Expose Ethnic Biases in News'. Together they form a unique fingerprint.

Cite this