Huang, Guoxi and Bors, Adrian Gheorghe orcid.org/0000-0001-7838-0021 (2021) Region-based Non-local Operation for Video Classification. In: Proceedings of the International Conference on Pattern Recognition (ICPR). IEEE , Milan, Italy , pp. 10010-10017.
Abstract
Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the SomethingSomething V1 dataset.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © IEEE 2020. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details |
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) |
Depositing User: | Pure (York) |
Date Deposited: | 10 Mar 2023 14:00 |
Last Modified: | 18 Dec 2024 00:39 |
Published Version: | https://doi.org/10.1109/ICPR48806.2021.9411997 |
Status: | Published |
Publisher: | IEEE |
Identification Number: | 10.1109/ICPR48806.2021.9411997 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:197215 |
Download
Filename: ICPR2020.pdf
Description: Region-based Non-local Operation for Video Classification