Abstract
When humans read a text, their eye movements are influenced by the structural complexity of the input sentences. This cognitive phenomenon holds across languages and recent studies indicate that multilingual language models utilize structural similarities between languages to facilitate cross-lingual transfer. We use sentence-level eye-tracking patterns as a cognitive indicator for structural complexity and show that the multilingual model XLM-RoBERTa can successfully predict varied patterns for 13 typologically diverse languages, despite being fine-tuned only on English data. We quantify the sensitivity of the model to structural complexity and distinguish a range of complexity characteristics. Our results indicate that the model develops a meaningful bias towards sentence length but also integrates cross-lingual differences. We conduct a control experiment with randomized word order and find that the model seems to additionally capture more complex structural information.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics: EACL 2023 |
Subtitle of host publication | [Proceedings] |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 643-657 |
Number of pages | 15 |
ISBN (Electronic) | 9781959429470 |
DOIs | |
Publication status | Published - 2023 |
Event | 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023 - Dubrovnik, Croatia Duration: 2 May 2023 → 6 May 2023 |
Conference
Conference | 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023 |
---|---|
Country/Territory | Croatia |
City | Dubrovnik |
Period | 2/05/23 → 6/05/23 |
Bibliographical note
Funding Information:We thank the anonymous reviewers for their insightful feedback. L. Beinborn’s research was supported by the Dutch National Science Organisation (NWO) through the projects CLARIAHPLUS (CP-W6-19-005) and VENI (Vl.Veni.211C.039).
Publisher Copyright:
© 2023 Association for Computational Linguistics.