Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

This paper presents an approach for investigating the nature of semantic information captured by word embeddings. We propose a method that extends an existing humanelicited semantic property dataset with gold negative examples using crowd judgments. Our experimental approach tests the ability of supervised classifiers to identify semantic features in word embedding vectors and compares this to a feature-identification method
based on full vector cosine similarity. The idea behind this method is that properties identified by classifiers, but not through full vector comparison are captured by embeddings. Properties that cannot be identified by either method are not. Our results provide an initial indication that semantic properties relevant for the way entities interact (e.g. dangerous) are captured, while perceptual information (e.g. colors) is not represented. We conclude that, though preliminary, these results show that our method is suitable for identifying which properties are captured by embeddings.
Original languageEnglish
Title of host publicationThe 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
Subtitle of host publicationProceedings of the First Workshop
Place of PublicationBrussels
PublisherAssociation for Computational Linguistics (ACL)
Pages276-286
Number of pages10
ISBN (Print)9781948087711
Publication statusPublished - 1 Nov 2018

Fingerprint

Kitchens
Semantics
Testing
Classifiers
Gold
Color

Cite this

Sommerauer, P. J. M., & Fokkens, A. S. (2018). Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell. In The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP: Proceedings of the First Workshop (pp. 276-286). Brussels: Association for Computational Linguistics (ACL).
Sommerauer, P.J.M. ; Fokkens, A.S. / Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell. The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP: Proceedings of the First Workshop. Brussels : Association for Computational Linguistics (ACL), 2018. pp. 276-286
@inproceedings{2a8021ed2c3a4ac39499245a2a2432da,
title = "Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell",
abstract = "This paper presents an approach for investigating the nature of semantic information captured by word embeddings. We propose a method that extends an existing humanelicited semantic property dataset with gold negative examples using crowd judgments. Our experimental approach tests the ability of supervised classifiers to identify semantic features in word embedding vectors and compares this to a feature-identification methodbased on full vector cosine similarity. The idea behind this method is that properties identified by classifiers, but not through full vector comparison are captured by embeddings. Properties that cannot be identified by either method are not. Our results provide an initial indication that semantic properties relevant for the way entities interact (e.g. dangerous) are captured, while perceptual information (e.g. colors) is not represented. We conclude that, though preliminary, these results show that our method is suitable for identifying which properties are captured by embeddings.",
author = "P.J.M. Sommerauer and A.S. Fokkens",
year = "2018",
month = "11",
day = "1",
language = "English",
isbn = "9781948087711",
pages = "276--286",
booktitle = "The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP",
publisher = "Association for Computational Linguistics (ACL)",

}

Sommerauer, PJM & Fokkens, AS 2018, Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell. in The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP: Proceedings of the First Workshop. Association for Computational Linguistics (ACL), Brussels, pp. 276-286.

Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell. / Sommerauer, P.J.M.; Fokkens, A.S.

The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP: Proceedings of the First Workshop. Brussels : Association for Computational Linguistics (ACL), 2018. p. 276-286.

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell

AU - Sommerauer, P.J.M.

AU - Fokkens, A.S.

PY - 2018/11/1

Y1 - 2018/11/1

N2 - This paper presents an approach for investigating the nature of semantic information captured by word embeddings. We propose a method that extends an existing humanelicited semantic property dataset with gold negative examples using crowd judgments. Our experimental approach tests the ability of supervised classifiers to identify semantic features in word embedding vectors and compares this to a feature-identification methodbased on full vector cosine similarity. The idea behind this method is that properties identified by classifiers, but not through full vector comparison are captured by embeddings. Properties that cannot be identified by either method are not. Our results provide an initial indication that semantic properties relevant for the way entities interact (e.g. dangerous) are captured, while perceptual information (e.g. colors) is not represented. We conclude that, though preliminary, these results show that our method is suitable for identifying which properties are captured by embeddings.

AB - This paper presents an approach for investigating the nature of semantic information captured by word embeddings. We propose a method that extends an existing humanelicited semantic property dataset with gold negative examples using crowd judgments. Our experimental approach tests the ability of supervised classifiers to identify semantic features in word embedding vectors and compares this to a feature-identification methodbased on full vector cosine similarity. The idea behind this method is that properties identified by classifiers, but not through full vector comparison are captured by embeddings. Properties that cannot be identified by either method are not. Our results provide an initial indication that semantic properties relevant for the way entities interact (e.g. dangerous) are captured, while perceptual information (e.g. colors) is not represented. We conclude that, though preliminary, these results show that our method is suitable for identifying which properties are captured by embeddings.

UR - https://aclanthology.info/papers/W18-5400/w18-5400

UR - http://aclweb.org/anthology/W18-5400

M3 - Conference contribution

SN - 9781948087711

SP - 276

EP - 286

BT - The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

PB - Association for Computational Linguistics (ACL)

CY - Brussels

ER -

Sommerauer PJM, Fokkens AS. Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell. In The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP: Proceedings of the First Workshop. Brussels: Association for Computational Linguistics (ACL). 2018. p. 276-286