Minimalist Entity Disambiguation for Mid-Resource Languages

Benno Kruit*

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

For many languages and applications, even though enough data is available for training Named Entity Disambiguation (NED) systems, few off-the-shelf models are available for use in practice. This is due to both the large size of state-of-the-art models, and to the computational requirements for recreating them from scratch. However, we observe that in practice, acceptable models can be trained and run with far fewer resources. In this work, we introduce MiniNED, a framework for creating small NED models from medium-sized datasets. The resulting models can be tuned for applicationspecific objectives and trade-offs, depending on practitioners' requirements concerning model size, frequency bias, and out-of-domain generalization. We evaluate the framework in nine languages, and achieve reasonable performance using models that are a fraction of the size of recent work.

Original languageEnglish
Title of host publicationProceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)
EditorsNafise Sadat Moosavi, Iryna Gurevych, Yufang Hou, Gyuwan Kim, Jin Kim Young, Tal Schuster, Ameeta Agrawal
PublisherAssociation for Computational Linguistics (ACL)
Pages299-306
Number of pages8
ISBN (Electronic)9781959429791
DOIs
Publication statusPublished - 2023
Event4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023 - Toronto, Canada
Duration: 13 Jul 2023 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023
Country/TerritoryCanada
CityToronto
Period13/07/23 → …

Bibliographical note

Publisher Copyright:
© 2023 Proceedings of the Annual Meeting of the Association for Computational Linguistics. All rights reserved.

Funding

This research is partially funded by Huawei Amsterdam Research Center. This research is partially funded by Huawei Amsterdam Research Center. I would like to thank Thiviyan Thanapalasingam, Majid Mohammedi and Erman Acar for their early feedback on language-specific models, and the members of the VU Amsterdam Knowledge in AI and Learning & Reasoning groups, Winston Wansleeben, Rens Hassfeld and Mara Spadon for their feedback on drafts of this work.

FundersFunder number
Huawei Amsterdam Research Center
Knowledge Centre Overweight, EMGO Institute for Health and Care Research, VU University Medical Centre, Amsterdam, The Netherlands. [email protected]

    Fingerprint

    Dive into the research topics of 'Minimalist Entity Disambiguation for Mid-Resource Languages'. Together they form a unique fingerprint.

    Cite this