Recently, various crowdsourcing initiatives showed that targeted efforts of user communities result in massive amounts of tags. For example, the Netherlands Institute for Sound and Vision collected a large number of tags with the video labeling game Waisda?. To successfully utilize these tags, a better understanding of their characteristics is required. The goal of this paper is twofold: (i) to investigate the vocabulary that users employ when describing videos and compare it to the vocabularies used by professionals; and (ii) to establish which aspects of the video are typically described and what type of tags are used for this. We report on an analysis of the tags collected with Waisda?. With respect to the first goal, we compared the the tags with a typical domain thesaurus used by professionals, as well as with a more general vocabulary. With respect to the second goal, we compare the tags to the video subtitles to determine how many tags are derived from the audio signal. In addition, we perform a qualitative study in which a tag sample is interpreted in terms of an existing annotation classification framework. The results suggest that the tags complement the metadata provided by professional cataloguers, the tags describe both the audio and the visual aspects of the video, and the users primarily describe objects in the video using general descriptions. © 2011 ACM.
|Title of host publication||International Conference on Knowledge Capture 2011|
|Editors||M. Musen, O. Corcho|
|Publication status||Published - 2011|
|Event||International Conference on Knowledge Capture 2011 - |
Duration: 1 Jan 2011 → 1 Jan 2011
|Conference||International Conference on Knowledge Capture 2011|
|Period||1/01/11 → 1/01/11|