Abstract
We present SlotGAN, a framework for training a mention detection model that only requires unlabeled text and a gazetteer. It consists of a generator trained to extract spans from an input sentence, and a discriminator trained to determine whether a span comes from the generator, or from the gazetteer. We evaluate the method on English newswire data and compare it against supervised, weakly-supervised, and unsupervised methods. We find that the performance of the method is lower than these baselines, because it tends to generate more and longer spans, and in some cases it relies only on capitalization. In other cases, it generates spans that are valid but differ from the benchmark. When evaluated with metrics based on overlap, we find that SlotGAN performs within 95% of the precision of a supervised method, and 84% of its recall. Our results suggest that the model can generate spans that overlap well, but an additional filtering mechanism is required.
| Original language | English |
|---|---|
| Title of host publication | SPNLP 2022 |
| Subtitle of host publication | 6th Workshop on Structured Prediction for NLP, Proceedings of the Workshop |
| Editors | Andreas Vlachos, Priyanka Agrawal, Andre Martins, Gerasimos Lampouras, Chunchuan Lyu |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 32-39 |
| Number of pages | 8 |
| ISBN (Electronic) | 9781955917513 |
| DOIs | |
| Publication status | Published - 2022 |
| Event | 6th Workshop on Structured Prediction for NLP, SPNLP 2022 - Dublin, Ireland Duration: 27 May 2022 → … |
Conference
| Conference | 6th Workshop on Structured Prediction for NLP, SPNLP 2022 |
|---|---|
| Country/Territory | Ireland |
| City | Dublin |
| Period | 27/05/22 → … |
Bibliographical note
Funding Information:This project was funded by Elsevier’s Discovery Lab.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
Funding
This project was funded by Elsevier’s Discovery Lab.