Research output per year
Research output per year
Stefan F. Schouten*, Baran Barbarestani, Wondimagegnhue Tufa, Piek Vossen, Ilia Markov
Research output: Chapter in Book / Report / Conference proceeding › Conference contribution › Academic › peer-review
Given the dynamic nature of toxic language use, automated methods for detecting toxic spans are likely to encounter distributional shift. To explore this phenomenon, we evaluate three approaches for detecting toxic spans under cross-domain conditions: lexicon-based, rationale extraction, and fine-tuned language models. Our findings indicate that a simple method using off-the-shelf lexicons performs best in the cross-domain setup. The cross-domain error analysis suggests that (1) rationale extraction methods are prone to false negatives, while (2) language models, despite performing best for the in-domain case, recall fewer explicitly toxic words than lexicons and are prone to certain types of false positives. Our code is publicly available at: https://github.com/sfschouten/toxic-cross-domain.
Original language | English |
---|---|
Title of host publication | Natural Language Processing and Information Systems |
Subtitle of host publication | 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Derby, UK, June 21–23, 2023, Proceedings |
Editors | Elisabeth Métais, Farid Meziane, Warren Manning, Stephan Reiff-Marganiec, Vijayan Sugumaran |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 533-545 |
Number of pages | 13 |
ISBN (Electronic) | 9783031353208 |
ISBN (Print) | 9783031353192 |
DOIs | |
Publication status | Published - 2023 |
Event | 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023 - Derby, United Kingdom Duration: 21 Jun 2023 → 23 Jun 2023 |
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 13913 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference | 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023 |
---|---|
Country/Territory | United Kingdom |
City | Derby |
Period | 21/06/23 → 23/06/23 |
Acknowledgements. This research was supported by Huawei Finland through the DreamsLab project. All content represented the opinions of the authors, which were not necessarily shared or endorsed by their respective employers and/or sponsors.
Research output: Working paper / Preprint › Preprint › Academic