Abstract
In this work, we address the evaluation of distributional semantic models trained on smaller, domain-specific texts, particularly philosophical text. Specifically, we inspect the behaviour of models using a pretrained background space in learning. We propose a measure of consistency which can be used as an evaluation metric when no in-domain gold-standard data is available. This measure simply computes the ability of a model to learn similar embeddings from different parts of some homogeneous data. We show that in spite of being a simple evaluation, consistency actually depends on various combinations of factors, including the nature of the data itself, the model used to train the semantic space, and the frequency of the learned terms, both in the background space and in the in-domain data of interest.
| Original language | English |
|---|---|
| Title of host publication | Natural Language Processing in a Deep Learning World |
| Subtitle of host publication | Proceedings |
| Editors | Galia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova, Irina Temnikova |
| Publisher | Incoma Ltd. |
| Pages | 132-141 |
| Number of pages | 10 |
| ISBN (Electronic) | 9789544520564 |
| ISBN (Print) | 9789544520557 |
| DOIs | |
| Publication status | Published - Sept 2019 |
| Event | 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 - Varna, Bulgaria Duration: 2 Sept 2019 → 4 Sept 2019 |
Publication series
| Name | International Conference Recent Advances in Natural Language Processing, RANLP |
|---|---|
| Publisher | INCOMA |
| Number | September |
| Volume | 2019 |
| ISSN (Print) | 1313-8502 |
Conference
| Conference | 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 |
|---|---|
| Country/Territory | Bulgaria |
| City | Varna |
| Period | 2/09/19 → 4/09/19 |
Funding
We are grateful to Yvette Oortwijn and Arianna Betti for their input as Quine domain experts. We also thank them as well as Lisa Dondorp, Thijs Os-senkoppele and Maud van Lier for their work on the Quine in Context corpus. We thank Pia Som-merauer for help with the SVD setup, and the UvA e-Ideas group for their fruitful discussion of a draft of this paper. We also thank the anonymous reviewers for their time and valuable comments. This research was supported by VICI grant e-Ideas (277-20-007) awarded to Arianna Betti and VENI grant Reading between the lines 275-89-029 awarded to Antske Fokkens, both financed by the Dutch Research Council (NWO).
Fingerprint
Dive into the research topics of 'Evaluating the consistency of word embeddings from small data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver