Conditional Attention for Content-based Image Retrieval

Abstract

Deep learning based feature extraction combined with visual attention mechanism is shown to provide good results in content-based image retrieval (CBIR). Ideally, CBIR should rely on regions which contain objects of interest that appear in the query image. However, most existing attention models just predict the most likely region of interest based on the knowledge learned from the training dataset regardless of the content in the query image. As a result, they may look towards contexts outside the object of interest, especially when there are multiple potential objects of interest in a given image. In this paper, we propose a conditional attention model which is sensitive to the input query image content and can generate more accurate attention maps. A key-point detection and description based method is proposed for training data generation. Consequently, our model does not require any additional attention label for training. The proposed attention model enables the spatial pooling feature extraction method (generalized mean pooling) improves image feature representation and leads to better image retrieval performance. The proposed framework is tested on a series of databases where it is shown to perform well in challenging situations.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Hu, Zechao Bors, Adrian Gheorghe https://orcid.org/0000-0001-7838-0021
Dates:	Published: August 2020
Institution:	The University of York
Academic Units:	The University of York > Faculty of Sciences (York) > Computer Science (York)
Date Deposited:	30 Nov 2020 11:00
Last Modified:	09 Mar 2026 00:01
Status:	Published
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:168545

Download