Who are the haters? A corpus-based demographic analysis of authors of hate speech

Lisa Hilte*, Ilia Markov, Nikola Ljubešić, Darja Fišer, Walter Daelemans

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Introduction: We examine the profiles of hate speech authors in a multilingual dataset of Facebook reactions to news posts discussing topics related to migrants and the LGBT+ community. The included languages are English, Dutch, Slovenian, and Croatian. Methods: First, all utterances were manually annotated as hateful or acceptable speech. Next, we used binary logistic regression to inspect how the production of hateful comments is impacted by authors' profiles (i.e., their age, gender, and language). Results: Our results corroborate previous findings: in all four languages, men produce more hateful comments than women, and people produce more hate speech as they grow older. But our findings also add important nuance to previously attested tendencies: specific age and gender dynamics vary slightly in different languages or cultures, suggesting that distinct (e.g., socio-political) realities are at play. Discussion: Finally, we discuss why author demographics are important in the study of hate speech: the profiles of prototypical “haters” can be used for hate speech detection, for sensibilization on and for counter-initiatives to the spread of (online) hatred.

Original languageEnglish
Article number986890
Pages (from-to)1-12
Number of pages12
JournalFrontiers in Artificial Intelligence
Volume6
DOIs
Publication statusPublished - 19 May 2023

Bibliographical note

Funding Information:
This work has been supported by the Slovenian Research Agency (ARRS) and the Flemish Research Foundation (FWO) through the bilateral research project ARRS N06-0099 and FWO G070619N LiLaH: Linguistic landscape of hate speech on social media; by the ARRS research core funding No. P6-0411 Language resources and technologies for Slovene language; the European Union's Rights, Equality and Citizenship Programme (2014-2020) project IMSyPP (Grant No. 875263); and the ARRS projects P6-0436 Digital Humanities: Resources, tools and methods, P6-0215 Slovene Language: Basic, Cognitive and Applied Studies, J5-3102 Hate speech in contemporary conceptualizations of nationalism, racism, gender and migration, and J7-4642 Basic research for the development of spoken language resources and speech technologies for the Slovenian language.

Publisher Copyright:
Copyright © 2023 Hilte, Markov, Ljubešić, Fišer and Daelemans.

Funding

This work has been supported by the Slovenian Research Agency (ARRS) and the Flemish Research Foundation (FWO) through the bilateral research project ARRS N06-0099 and FWO G070619N LiLaH: Linguistic landscape of hate speech on social media; by the ARRS research core funding No. P6-0411 Language resources and technologies for Slovene language; the European Union's Rights, Equality and Citizenship Programme (2014-2020) project IMSyPP (Grant No. 875263); and the ARRS projects P6-0436 Digital Humanities: Resources, tools and methods, P6-0215 Slovene Language: Basic, Cognitive and Applied Studies, J5-3102 Hate speech in contemporary conceptualizations of nationalism, racism, gender and migration, and J7-4642 Basic research for the development of spoken language resources and speech technologies for the Slovenian language.

FundersFunder number
Fonds Wetenschappelijk OnderzoekARRS N06-0099, P6-0411, P6-0215, 875263, P6-0436, J7-4642
Javna Agencija za Raziskovalno Dejavnost RS

    Keywords

    • age
    • demographics
    • gender
    • hate speech
    • language area

    Fingerprint

    Dive into the research topics of 'Who are the haters? A corpus-based demographic analysis of authors of hate speech'. Together they form a unique fingerprint.

    Cite this