Measuring the Diversity of Automatic Image Descriptions

C.W.J. van Miltenburg, Desmond Elliott, P.T.J.M. Vossen

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review


Automatic image description systems typically produce generic sentences that only make use of
a small subset of the vocabulary available to them. In this paper, we consider the production
of generic descriptions as a lack of diversity in the output, which we quantify using established
metrics and two new metrics that frame image description as a word recall task. This framing
allows us to evaluate system performance on the head of the vocabulary, as well as on the long
tail, where system performance degrades. We use these metrics to examine the diversity of the
sentences generated by nine state-of-the-art systems on the MS COCO data set. We find that
the systems trained with maximum likelihood objectives produce less diverse output than those
trained with additional adversarial objectives. However, the adversarially-trained models only
produce more types from the head of the vocabulary and not the tail. Besides vocabulary-based
methods, we also look at the compositional capacity of the systems, specifically their ability to
create compound nouns and prepositional phrases of different lengths. We conclude that there
is still much room for improvement, and offer a toolkit to measure progress towards the goal of
generating more diverse image descriptions.
Original languageEnglish
Title of host publicationProceedings of the the International Conference on Computational Linguistics (COLING 2018)
PublisherInternational Conference on Computational Linguistics (COLING)
Number of pages12
ISBN (Print)9781948087506
Publication statusPublished - 2018
Event27th International Conference on Computational Linguistics COLING 2018 - Santa Fe, NM
Duration: 20 Aug 201826 Aug 2018
Conference number: 27


Conference27th International Conference on Computational Linguistics COLING 2018
Abbreviated titleCOLING 2018
CitySanta Fe, NM


Dive into the research topics of 'Measuring the Diversity of Automatic Image Descriptions'. Together they form a unique fingerprint.

Cite this