Abstract
The subjectivity of recognizing hate speech makes it a complex task. This is also reflected by different and incomplete definitions in NLP. We present hate speech criteria, developed with perspectives from law and social science, with the aim of helping researchers create more precise definitions and annotation guidelines on five aspects: (1) target groups, (2) dominance, (3) perpetrator characteristics, (4) type of negative group reference, and the (5) type of potential consequences/effects. Definitions can be structured so that they cover a more broad or more narrow phenomenon. As such, conscious choices can be made on specifying criteria or leaving them open. We argue that the goal and exact task developers have in mind should determine how the scope of hate speech is defined. We provide an overview of the properties of English datasets from hatespeechdata.com that may help select the most suitable dataset for a specific scenario.
Original language | English |
---|---|
Title of host publication | Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH) |
Editors | Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, Zeerak Talat |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 176-191 |
Number of pages | 16 |
ISBN (Electronic) | 9781955917841 |
DOIs | |
Publication status | Published - Jul 2022 |
Event | 6th Workshop on Online Abuse and Harms, WOAH 2022 - Seattle, United States Duration: 14 Jul 2022 → … |
Conference
Conference | 6th Workshop on Online Abuse and Harms, WOAH 2022 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 14/07/22 → … |
Bibliographical note
Funding Information:This research was (partially) funded by the Hybrid Intelligence Center, a 10-year programme funded by the Dutch Ministry of Education, Culture and Science through the Netherlands Organisation for Scientific Research. We would additionally like to thank the reviewers for providing us with valuable feedback that has helped improving this paper.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
Funding
This research was (partially) funded by the Hybrid Intelligence Center, a 10-year programme funded by the Dutch Ministry of Education, Culture and Science through the Netherlands Organisation for Scientific Research. We would additionally like to thank the reviewers for providing us with valuable feedback that has helped improving this paper.