Evaluating the consistency of word embeddings from small data

Jelke Bloem, Antske Fokkens, Aurélie Herbelot

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

In this work, we address the evaluation of distributional semantic models trained on smaller, domain-specific texts, particularly philosophical text. Specifically, we inspect the behaviour of models using a pretrained background space in learning. We propose a measure of consistency which can be used as an evaluation metric when no in-domain gold-standard data is available. This measure simply computes the ability of a model to learn similar embeddings from different parts of some homogeneous data. We show that in spite of being a simple evaluation, consistency actually depends on various combinations of factors, including the nature of the data itself, the model used to train the semantic space, and the frequency of the learned terms, both in the background space and in the in-domain data of interest.

Original languageEnglish
Title of host publicationNatural Language Processing in a Deep Learning World
Subtitle of host publicationProceedings
EditorsGalia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova, Irina Temnikova
PublisherIncoma Ltd.
Pages132-141
Number of pages10
ISBN (Electronic)9789544520564
ISBN (Print)9789544520557
DOIs
Publication statusPublished - Sept 2019
Event12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 - Varna, Bulgaria
Duration: 2 Sept 20194 Sept 2019

Publication series

NameInternational Conference Recent Advances in Natural Language Processing, RANLP
PublisherINCOMA
NumberSeptember
Volume2019
ISSN (Print)1313-8502

Conference

Conference12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019
Country/TerritoryBulgaria
CityVarna
Period2/09/194/09/19

Fingerprint

Dive into the research topics of 'Evaluating the consistency of word embeddings from small data'. Together they form a unique fingerprint.

Cite this