Abstract
The web is a major source for news and information. Yet, news can perpetuate and amplify biases and stereotypes. Prior work has shown that training static word embeddings can expose such biases. In this short paper, we apply both a conventional Word2Vec approach as well as a more modern BERT-based approach to a large corpus of Dutch news. We demonstrate that both methods expose ethnic biases in the news corpus. We also show that the biases in the news corpus are considerably stronger than the biases in the transformer model itself.
Original language | English |
---|---|
Title of host publication | WEBSCI '24 |
Subtitle of host publication | Proceedings of the 16th ACM Web Science Conference |
Editors | Luca Maria Aiello, Yelena Mejova, Oshani Seneviratne, Jun Sun, Sierra Kaiser, Steffen Staab |
Publisher | Association for Computing Machinery, Inc |
Pages | 290-295 |
Number of pages | 6 |
ISBN (Electronic) | 9798400703348 |
DOIs | |
Publication status | Published - 2024 |
Event | 16th ACM Web Science Conference, WebSci 2024 - Stuttgart, Germany Duration: 21 May 2024 → 24 May 2024 |
Conference
Conference | 16th ACM Web Science Conference, WebSci 2024 |
---|---|
Country/Territory | Germany |
City | Stuttgart |
Period | 21/05/24 → 24/05/24 |
Bibliographical note
Publisher Copyright:© 2024 Copyright held by the owner/author(s)
Keywords
- bias
- contextualized word embeddings
- news
- static word embeddings
- stereotypes
- transformer