Abstract
Effective cross-modal retrieval requires robust alignment of heterogeneous data types. Most existing methods focus on bi-modal retrieval tasks and rely on distributional alignment techniques such as Kullback-Leibler divergence, Maximum Mean Discrepancy, and correlation alignment. However, these methods often suffer from critical limitations, including numerical instability, sensitivity to hyperparameters, and their inability to capture the full structure of the underlying distributions. In this paper, we introduce the Cauchy-Schwarz (CS) divergence, a hyperparameter-free measure that improves both training stability and retrieval performance. We further propose a novel Generalized CS (GCS) divergence inspired by Holder's inequality. This extension enables direct alignment of three or more modalities within a unified mathematical framework through a bidirectional circular comparison scheme, eliminating the need for exhaustive pairwise comparisons. Extensive experiments on six benchmark datasets demonstrate the effectiveness of our method in both bi-modal and tri-modal retrieval tasks. The code of our CS/GCS divergence is publicly available at https://github.com/JiahaoZhang666/CSD.
| Original language | English |
|---|---|
| Title of host publication | MM '25: Proceedings of the 33rd ACM International Conference on Multimedia |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 2064-2073 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798400720352 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | 33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland Duration: 27 Oct 2025 → 31 Oct 2025 |
Conference
| Conference | 33rd ACM International Conference on Multimedia, MM 2025 |
|---|---|
| Country/Territory | Ireland |
| City | Dublin |
| Period | 27/10/25 → 31/10/25 |
Bibliographical note
Publisher Copyright:© 2025 ACM.
Keywords
- bidirectional circular comparison
- cauchy-schwarz divergence
- cross-modal retrieval
- feature alignment
- multiple modalities