Abstract
In this paper, we discuss an interpretable frame-
work to integrate toxic language annotations.
Most data sets address only one aspect of the
complex relationship in toxic communication
and are inconsistent with each other. Enrich-
ing annotations with more details and infor-
mation is, however, of great importance in or-
der to develop high-performing and compre-
hensive explainable language models. Such
systems should recognize and interpret both
expressions that are toxic as well as expres-
sions that make reference to specific targets to
combat toxic language. We, therefore, create
a crowd-annotation task to mark the spans of
words that refer to target communities as an ex-
tension of the HateXplain data set. We present
a quantitative and qualitative analysis of the
annotations. We also fine-tune RoBERTa-base
on our data and experiment with different data
thresholds to measure their effect on the clas-
sification. The F1-score of our best model on
the test set is 79%. The annotations are freely
available and can be combined with the exist-
ing HateXplain annotations to build richer and
more complete models.
work to integrate toxic language annotations.
Most data sets address only one aspect of the
complex relationship in toxic communication
and are inconsistent with each other. Enrich-
ing annotations with more details and infor-
mation is, however, of great importance in or-
der to develop high-performing and compre-
hensive explainable language models. Such
systems should recognize and interpret both
expressions that are toxic as well as expres-
sions that make reference to specific targets to
combat toxic language. We, therefore, create
a crowd-annotation task to mark the spans of
words that refer to target communities as an ex-
tension of the HateXplain data set. We present
a quantitative and qualitative analysis of the
annotations. We also fine-tune RoBERTa-base
on our data and experiment with different data
thresholds to measure their effect on the clas-
sification. The F1-score of our best model on
the test set is 79%. The annotations are freely
available and can be combined with the exist-
ing HateXplain annotations to build richer and
more complete models.
Original language | English |
---|---|
Title of host publication | Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022) at the 29th International Conference on Computational Linguistic (COLING) |
Publisher | Association for Computational Linguistics |
Pages | 43-51 |
Number of pages | 9 |
ISBN (Electronic) | 9781713861973 |
Publication status | Published - 2022 |