An Ensemble Approach for Dutch Cross-Domain Hate Speech Detection

Ilia Markov*, Ine Gevers, Walter Daelemans

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

203 Downloads (Pure)

Abstract

Over the past years, the amount of online hate speech has been growing steadily. Among multiple approaches to automatically detect hateful content online, ensemble learning is considered one of the best strategies, as shown by several studies on English and other languages. In this paper, we evaluate state-of-the-art approaches for Dutch hate speech detection both under in-domain and cross-domain hate speech detection conditions, and introduce a new ensemble approach with additional features for detecting hateful content in Dutch social media. The ensemble consists of the gradient boosting classifier that incorporates state-of-the-art transformer-based pre-trained language models for Dutch (i.e., BERTje and RobBERT), a robust SVM approach, and additional input information such as the number of emotion-conveying and hateful words, the number of personal pronouns, and the length of the message. The ensemble significantly outperforms all the individual models both in the in-domain and cross-domain hate speech detection settings. We perform an in-depth error analysis focusing on the explicit and implicit hate speech instances, providing various insights into open challenges in Dutch hate speech detection and directions for future research.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems
Subtitle of host publication27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022, Valencia, Spain, June 15–17, 2022, Proceedings
EditorsPaolo Rosso, Valerio Basile, Raquel Martínez, Elisabeth Métais, Farid Meziane
PublisherSpringer Science and Business Media Deutschland GmbH
Pages3-15
Number of pages13
ISBN (Electronic)9783031084737
ISBN (Print)9783031084720
DOIs
Publication statusPublished - 2022
Event27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022 - Valencia, Spain
Duration: 15 Jun 202217 Jun 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13286 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022
Country/TerritorySpain
CityValencia
Period15/06/2217/06/22

Bibliographical note

Funding Information:
This research has been supported by the Flemish Research Foundation through the bilateral research project FWO G070619N “The linguistic landscape of hate speech on social media”. The research also received funding from the Flemish Government (AI Research Program).

Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Funding

This research has been supported by the Flemish Research Foundation through the bilateral research project FWO G070619N “The linguistic landscape of hate speech on social media”. The research also received funding from the Flemish Government (AI Research Program).

Keywords

  • Hate speech
  • Dutch
  • Cross-domain
  • Ensemble

Fingerprint

Dive into the research topics of 'An Ensemble Approach for Dutch Cross-Domain Hate Speech Detection'. Together they form a unique fingerprint.

Cite this