Research output per year
Research output per year
Yee Man Ng, Ilia Markov
Research output: Chapter in Book / Report / Conference proceeding › Conference contribution › Academic › peer-review
Native Language Identification (NLI) – the task of identifying the native language (L1) of a person based on their writing in the second language (L2) – has applications in forensics, marketing, and second language acquisition. Historically, conventional machine learning approaches that heavily rely on extensive feature engineering have outperformed transformer-based language models on this task. Recently, closed-source generative large language models (LLMs), e.g., GPT-4, have demonstrated remarkable performance on NLI in a zero-shot setting, including promising results in open-set classification. However, closed-source LLMs have many disadvantages, such as high costs and undisclosed nature of training data. This study explores the potential of using open-source LLMs for NLI. Our results indicate that open-source LLMs do not reach the accuracy levels of closed-source LLMs when used out-of-the-box. However, when fine-tuned on labeled training data, open-source LLMs can achieve performance comparable to that of commercial LLMs.
Original language | English |
---|---|
Title of host publication | VarDial 2025 - 12th Workshop on NLP for Similar Languages, Varieties and Dialects, Proceedings of the Workshop |
Editors | Yves Scherrer, Tommi Jauhiainen, Nikola Ljubesic, Preslav Nakov, Jorg Tiedemann, Marcos Zampieri |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 20-28 |
Number of pages | 9 |
ISBN (Electronic) | 9798891762084 |
Publication status | Published - 2025 |
Event | 12th Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2025 - co-located with the 31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, United Arab Emirates Duration: 19 Jan 2025 → … |
Name | VarDial 2025 - 12th Workshop on NLP for Similar Languages, Varieties and Dialects, Proceedings of the Workshop |
---|
Conference | 12th Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2025 - co-located with the 31st International Conference on Computational Linguistics, COLING 2025 |
---|---|
Country/Territory | United Arab Emirates |
City | Abu Dhabi |
Period | 19/01/25 → … |
Research output: Working paper / Preprint › Preprint › Academic