Skip to main navigation Skip to search Skip to main content

A Deep Generative Approach to Native Language Identification

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Native language identification (NLI) – identifying the native language (L1) of a person based on his/her writing in the second language (L2) – is useful for a variety of purposes, including marketing, security, and educational applications. From a traditional machine learning perspective,NLI is usually framed as a multi-class classification task, where numerous designed features are combined in order to achieve the state-of-the-art results. We introduce a deep generative language modelling (LM) approach to NLI, which consists in fine-tuning a GPT-2 model separately on texts written by the authors with the same L1, and assigning a label to an unseen text based on the minimum LM loss with respect to one of these fine-tuned GPT-2 models. Our method outperforms traditional machine learning approaches and currently achieves the best results on the benchmark NLI datasets.
Original languageEnglish
Title of host publicationProceedings of the 28th International Conference on Computational Linguistics
EditorsDonia Scott, Nuria Bel, Chengqing Zong
PublisherAssociation for Computational Linguistics (ACL)
Pages1778–1783
Number of pages6
DOIs
Publication statusPublished - 2020

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 4 - Quality Education
    SDG 4 Quality Education

Fingerprint

Dive into the research topics of 'A Deep Generative Approach to Native Language Identification'. Together they form a unique fingerprint.

Cite this