Abstract
For a viewpoint-diverse news recommender, identifying whether two news articles express the same viewpoint is essential. One way to determine”same or different” viewpoint is stance detection. In this paper, we investigate the robustness of operationalization choices for few-shot stance detection, with special attention to modelling stance across different topics. Our experiments test pre-registered hypotheses on stance detection. Specifically, we compare two stance task definitions (Pro/Con versus Same Side Stance), two LLM architectures (bi-encoding versus cross-encoding), and adding Natural Language Inference knowledge, with pre-trained RoBERTa models trained with shots of 100 examples from 7 different stance detection datasets. Some of our hypotheses and claims from earlier work can be confirmed, while others give more inconsistent results. The effect of the Same Side Stance definition on performance differs per dataset and is influenced by other modelling choices. We found no relationship between the number of training topics in the training shots and performance. In general, cross-encoding out-performs bi-encoding, and adding NLI training to our models gives considerable improvement, but these results are not consistent across all datasets. Our results indicate that it is essential to include multiple datasets and systematic modelling experiments when aiming to find robust modelling choices for the concept 'stance'.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) |
Editors | Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue |
Place of Publication | Torino, Italia |
Publisher | ELRA and ICCL |
Pages | 9245-9260 |
Number of pages | 16 |
ISBN (Electronic) | 9782493814104 |
Publication status | Published - 2024 |
Event | Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 - Hybrid, Torino, Italy Duration: 20 May 2024 → 25 May 2024 |
Conference
Conference | Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 |
---|---|
Country/Territory | Italy |
City | Hybrid, Torino |
Period | 20/05/24 → 25/05/24 |
Bibliographical note
Publisher Copyright:© 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Funding
This research is part of the Rethinking News Algorithms project (2020-2024), funded through the Open Competition Digitalization Humanities and Social Science (grant nr 406.D1.19.073) by the Netherlands Organization of Scientific Research (NWO). Our computing was done through SURF Research Cloud, a national supercomputer infrastructure in the Netherlands also funded by the NWO. Thanks to Urja Khurana, Michiel van der Meer, and other PhDs from CLTL for their helpful feedback. We would also like to thank all anonymous reviewers, whose comments improved both this version and earlier versions of this paper. All remaining errors or unclarities are our own.
Funders | Funder number |
---|---|
CLTL | |
Nederlandse Organisatie voor Wetenschappelijk Onderzoek |
Keywords
- computational argumentation
- preregistration
- stance detection