Cross-Domain Toxic Spans Detection

Stefan F. Schouten*, Baran Barbarestani, Wondimagegnhue Tufa, Piek Vossen, Ilia Markov

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

20 Downloads (Pure)

Abstract

Given the dynamic nature of toxic language use, automated methods for detecting toxic spans are likely to encounter distributional shift. To explore this phenomenon, we evaluate three approaches for detecting toxic spans under cross-domain conditions: lexicon-based, rationale extraction, and fine-tuned language models. Our findings indicate that a simple method using off-the-shelf lexicons performs best in the cross-domain setup. The cross-domain error analysis suggests that (1) rationale extraction methods are prone to false negatives, while (2) language models, despite performing best for the in-domain case, recall fewer explicitly toxic words than lexicons and are prone to certain types of false positives. Our code is publicly available at: https://github.com/sfschouten/toxic-cross-domain.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems
Subtitle of host publication28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Derby, UK, June 21–23, 2023, Proceedings
EditorsElisabeth Métais, Farid Meziane, Warren Manning, Stephan Reiff-Marganiec, Vijayan Sugumaran
PublisherSpringer Science and Business Media Deutschland GmbH
Pages533-545
Number of pages13
ISBN (Electronic)9783031353208
ISBN (Print)9783031353192
DOIs
Publication statusPublished - 2023
Event28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023 - Derby, United Kingdom
Duration: 21 Jun 202323 Jun 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13913 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023
Country/TerritoryUnited Kingdom
CityDerby
Period21/06/2323/06/23

Bibliographical note

Funding Information:
This research was supported by Huawei Finland through the DreamsLab project. All content represented the opinions of the authors, which were not necessarily shared or endorsed by their respective employers and/or sponsors.

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Funding

Acknowledgements. This research was supported by Huawei Finland through the DreamsLab project. All content represented the opinions of the authors, which were not necessarily shared or endorsed by their respective employers and/or sponsors.

Fingerprint

Dive into the research topics of 'Cross-Domain Toxic Spans Detection'. Together they form a unique fingerprint.

Cite this