Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

What do powerful models of word mean- ing created from distributional data (e.g. Word2vec (Mikolov et al., 2013) BERT (Devlin et al., 2019) and ELMO (Peters et al., 2018)) represent? What causes words to be similar in the semantic space? What type of information is lacking? This thesis proposal presents a framework for investigating the information encoded in distributional semantic models. Several analysis methods have been suggested, but they have been shown to be limited and are not well understood. This approach pairs observations made on actual corpora with insights obtained from data manipulation experiments. The expected outcome is a better understanding of (1) the semantic information we can infer purely based on linguistic co-occurrence patterns and (2) the potential of distributional semantic models to pick up linguistic evidence
Original languageEnglish
Title of host publicationProceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationStudent Research Workshop
PublisherAssociation for Computational Linguistics
Pages134-142
Number of pages9
DOIs
Publication statusPublished - Jul 2020

Fingerprint

Dive into the research topics of 'Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models'. Together they form a unique fingerprint.

Cite this