Unknown Script: Impact of Script on Cross-Lingual Transfer

Research output: Chapter in Book / Report / Conference proceedingChapterAcademicpeer-review

Abstract

Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often-overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolingual and multilingual models that are pre-trained on different tokenization methods to determine factors that affect cross-lingual transfer to a new language with a unique script. Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size.

Original languageEnglish
Title of host publicationProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Subtitle of host publicationVolume 4: Student Research Workshop
EditorsYang (Trista) Cao, Isabel Papadimitriou, Anaelia Ovalle, Marcos Zampieri, Francis Ferraro, Swabha Swayamdipta
PublisherACL Anthology
Pages124-129
Number of pages6
Volume4
ISBN (Electronic)9798891761179
DOIs
Publication statusPublished - 2024
Event2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 - Hybrid, Mexico City, Mexico
Duration: 16 Jun 202421 Jun 2024

Conference

Conference2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024
Country/TerritoryMexico
CityHybrid, Mexico City
Period16/06/2421/06/24

Bibliographical note

Publisher Copyright:
© 2024 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Unknown Script: Impact of Script on Cross-Lingual Transfer'. Together they form a unique fingerprint.

Cite this