Annotating Targets of Toxic Language at the Span Level

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

In this paper, we discuss an interpretable frame-
work to integrate toxic language annotations.
Most data sets address only one aspect of the
complex relationship in toxic communication
and are inconsistent with each other. Enrich-
ing annotations with more details and infor-
mation is, however, of great importance in or-
der to develop high-performing and compre-
hensive explainable language models. Such
systems should recognize and interpret both
expressions that are toxic as well as expres-
sions that make reference to specific targets to
combat toxic language. We, therefore, create
a crowd-annotation task to mark the spans of
words that refer to target communities as an ex-
tension of the HateXplain data set. We present
a quantitative and qualitative analysis of the
annotations. We also fine-tune RoBERTa-base
on our data and experiment with different data
thresholds to measure their effect on the clas-
sification. The F1-score of our best model on
the test set is 79%. The annotations are freely
available and can be combined with the exist-
ing HateXplain annotations to build richer and
more complete models.
Original languageEnglish
Title of host publicationProceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022) at the 29th International Conference on Computational Linguistic (COLING)
PublisherAssociation for Computational Linguistics
Pages43-51
Number of pages9
ISBN (Electronic)9781713861973
Publication statusPublished - 2022

Fingerprint

Dive into the research topics of 'Annotating Targets of Toxic Language at the Span Level'. Together they form a unique fingerprint.

Cite this