R2-trans: Fine-grained visual categorization with redundancy reduction

Shuo Ye, Shujian Yu*, Yu Wang, Xinge You*

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

4 Downloads (Pure)

Abstract

Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences. Existing FGVC methods usually select discriminant regions found by a trained model, which is prone to neglect other potential discriminant information. On the other hand, the massive interactions between the sequence of image patches in ViT make the resulting class token contain lots of redundant information, which may also impact FGVC performance. In this paper, we present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target. Specifically, our model calculates the ratio of high-weight regions in a batch, adaptively adjusts the masking threshold, and achieves moderate extraction of background information in the input space. Moreover, we also use the Information Bottleneck (IB) approach to guide our network to learn a minimum sufficient representations in the feature space. Experimental results on three widely-used benchmark datasets verify that our approach can achieve better performance than other state-of-the-art approaches and baseline models. The code of our model is available at: https://github.com/SYe-hub/R-2-Trans.

Original languageEnglish
Article number104923
Pages (from-to)1-10
Number of pages10
JournalImage and Vision Computing
Volume143
Early online date1 Feb 2024
DOIs
Publication statusPublished - Mar 2024

Bibliographical note

Publisher Copyright:
© 2024

Funding

This work was supported in part by the National Key R&D Program of China 2022YFC3301000 , in part by the Fundamental Research Funds for the Central Universities , HUST: 2023JYCXJJ031 .

FundersFunder number
Huazhong University of Science and Technology2023JYCXJJ031
Huazhong University of Science and Technology
National Key Research and Development Program of China2022YFC3301000
National Key Research and Development Program of China
Fundamental Research Funds for the Central Universities

    Keywords

    • Batch-based dynamic mask
    • Fine-grained visual categorization
    • Information bottleneck

    Fingerprint

    Dive into the research topics of 'R2-trans: Fine-grained visual categorization with redundancy reduction'. Together they form a unique fingerprint.

    Cite this