Adding Semantics to Detectors for Video Retrieval

C. Snoek, B. Huurnink, L. Hollink, M. de Rijke, A.T. Schreiber, M. Worring

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small still compared to ontologies aiming to capture the full vocabulary a user has. We aim to throw a bridge between the two fields by building a multimedia thesaurus, i.e., a set of machine learned concept detectors that is enriched with semantic descriptions and semantic structure obtained from WordNet. Given a multimodal user query, we identify three strategies to select a relevant detector from this thesaurus, namely: text matching, ontology querying, and semantic visual querying. We evaluate the methods against the automatic search task of the TRECVID 2005 video retrieval benchmark, using a news video archive of 85 h in combination with a thesaurus of 363 machine learned concept detectors. We assess the influence of thesaurus size on video search performance, evaluate and compare the multimodal selection strategies for concept detectors, and finally discuss their combined potential using oracle fusion. The set of queries in the TRECVID 2005 corpus is too small for us to be definite in our conclusions, but the results suggest promising new lines of research. © 2007 IEEE.
Original languageEnglish
Pages (from-to)975-986
Number of pages12
JournalIEEE Transactions on Multimedia
Volume9
Issue number5
DOIs
Publication statusPublished - 2007

Fingerprint

Semantics
Thesauri
Detectors
Ontology
Fusion reactions

Bibliographical note

Snoek07a

Cite this

Snoek, C., Huurnink, B., Hollink, L., de Rijke, M., Schreiber, A. T., & Worring, M. (2007). Adding Semantics to Detectors for Video Retrieval. IEEE Transactions on Multimedia, 9(5), 975-986. https://doi.org/10.1109/TMM.2007.900156
Snoek, C. ; Huurnink, B. ; Hollink, L. ; de Rijke, M. ; Schreiber, A.T. ; Worring, M. / Adding Semantics to Detectors for Video Retrieval. In: IEEE Transactions on Multimedia. 2007 ; Vol. 9, No. 5. pp. 975-986.
@article{aa442e512b44490284d5c9e9f1318f11,
title = "Adding Semantics to Detectors for Video Retrieval",
abstract = "In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small still compared to ontologies aiming to capture the full vocabulary a user has. We aim to throw a bridge between the two fields by building a multimedia thesaurus, i.e., a set of machine learned concept detectors that is enriched with semantic descriptions and semantic structure obtained from WordNet. Given a multimodal user query, we identify three strategies to select a relevant detector from this thesaurus, namely: text matching, ontology querying, and semantic visual querying. We evaluate the methods against the automatic search task of the TRECVID 2005 video retrieval benchmark, using a news video archive of 85 h in combination with a thesaurus of 363 machine learned concept detectors. We assess the influence of thesaurus size on video search performance, evaluate and compare the multimodal selection strategies for concept detectors, and finally discuss their combined potential using oracle fusion. The set of queries in the TRECVID 2005 corpus is too small for us to be definite in our conclusions, but the results suggest promising new lines of research. {\circledC} 2007 IEEE.",
author = "C. Snoek and B. Huurnink and L. Hollink and {de Rijke}, M. and A.T. Schreiber and M. Worring",
note = "Snoek07a",
year = "2007",
doi = "10.1109/TMM.2007.900156",
language = "English",
volume = "9",
pages = "975--986",
journal = "IEEE Transactions on Multimedia",
issn = "1520-9210",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "5",

}

Snoek, C, Huurnink, B, Hollink, L, de Rijke, M, Schreiber, AT & Worring, M 2007, 'Adding Semantics to Detectors for Video Retrieval' IEEE Transactions on Multimedia, vol. 9, no. 5, pp. 975-986. https://doi.org/10.1109/TMM.2007.900156

Adding Semantics to Detectors for Video Retrieval. / Snoek, C.; Huurnink, B.; Hollink, L.; de Rijke, M.; Schreiber, A.T.; Worring, M.

In: IEEE Transactions on Multimedia, Vol. 9, No. 5, 2007, p. 975-986.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Adding Semantics to Detectors for Video Retrieval

AU - Snoek, C.

AU - Huurnink, B.

AU - Hollink, L.

AU - de Rijke, M.

AU - Schreiber, A.T.

AU - Worring, M.

N1 - Snoek07a

PY - 2007

Y1 - 2007

N2 - In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small still compared to ontologies aiming to capture the full vocabulary a user has. We aim to throw a bridge between the two fields by building a multimedia thesaurus, i.e., a set of machine learned concept detectors that is enriched with semantic descriptions and semantic structure obtained from WordNet. Given a multimodal user query, we identify three strategies to select a relevant detector from this thesaurus, namely: text matching, ontology querying, and semantic visual querying. We evaluate the methods against the automatic search task of the TRECVID 2005 video retrieval benchmark, using a news video archive of 85 h in combination with a thesaurus of 363 machine learned concept detectors. We assess the influence of thesaurus size on video search performance, evaluate and compare the multimodal selection strategies for concept detectors, and finally discuss their combined potential using oracle fusion. The set of queries in the TRECVID 2005 corpus is too small for us to be definite in our conclusions, but the results suggest promising new lines of research. © 2007 IEEE.

AB - In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small still compared to ontologies aiming to capture the full vocabulary a user has. We aim to throw a bridge between the two fields by building a multimedia thesaurus, i.e., a set of machine learned concept detectors that is enriched with semantic descriptions and semantic structure obtained from WordNet. Given a multimodal user query, we identify three strategies to select a relevant detector from this thesaurus, namely: text matching, ontology querying, and semantic visual querying. We evaluate the methods against the automatic search task of the TRECVID 2005 video retrieval benchmark, using a news video archive of 85 h in combination with a thesaurus of 363 machine learned concept detectors. We assess the influence of thesaurus size on video search performance, evaluate and compare the multimodal selection strategies for concept detectors, and finally discuss their combined potential using oracle fusion. The set of queries in the TRECVID 2005 corpus is too small for us to be definite in our conclusions, but the results suggest promising new lines of research. © 2007 IEEE.

U2 - 10.1109/TMM.2007.900156

DO - 10.1109/TMM.2007.900156

M3 - Article

VL - 9

SP - 975

EP - 986

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

IS - 5

ER -