Cross-Modal Retrieval with Cauchy-Schwarz Divergence

Jiahao Zhang, Wenzhe Yin, Shujian Yu*

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Effective cross-modal retrieval requires robust alignment of heterogeneous data types. Most existing methods focus on bi-modal retrieval tasks and rely on distributional alignment techniques such as Kullback-Leibler divergence, Maximum Mean Discrepancy, and correlation alignment. However, these methods often suffer from critical limitations, including numerical instability, sensitivity to hyperparameters, and their inability to capture the full structure of the underlying distributions. In this paper, we introduce the Cauchy-Schwarz (CS) divergence, a hyperparameter-free measure that improves both training stability and retrieval performance. We further propose a novel Generalized CS (GCS) divergence inspired by Holder's inequality. This extension enables direct alignment of three or more modalities within a unified mathematical framework through a bidirectional circular comparison scheme, eliminating the need for exhaustive pairwise comparisons. Extensive experiments on six benchmark datasets demonstrate the effectiveness of our method in both bi-modal and tri-modal retrieval tasks. The code of our CS/GCS divergence is publicly available at https://github.com/JiahaoZhang666/CSD.

Original languageEnglish
Title of host publicationMM '25: Proceedings of the 33rd ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages2064-2073
Number of pages10
ISBN (Electronic)9798400720352
DOIs
Publication statusPublished - 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Bibliographical note

Publisher Copyright:
© 2025 ACM.

Keywords

  • bidirectional circular comparison
  • cauchy-schwarz divergence
  • cross-modal retrieval
  • feature alignment
  • multiple modalities

Fingerprint

Dive into the research topics of 'Cross-Modal Retrieval with Cauchy-Schwarz Divergence'. Together they form a unique fingerprint.

Cite this