Xie, Q. orcid.org/0000-0001-9901-0396, Lai, Y.-K., Wu, J. et al. (4 more authors) (2020) MLCVNet: Multi-Level Context VoteNet for 3D Object Detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13-19 Jun 2020, Seattle, WA, USA. IEEE, pp. 10444-10453. ISBN: 978-1-7281-7169-2. ISSN: 1063-6919. EISSN: 2575-7075.
Abstract
In this paper, we address the 3D object detection task by capturing multi-level contextual information with the self-attention mechanism and multi-scale feature fusion. Most existing 3D object detection methods recognize objects individually, without giving any consideration on contextual information between these objects. Comparatively, we propose Multi-Level Context VoteNet (MLCVNet) to recognize 3D objects correlatively, building on the state-of-the-art VoteNet. We introduce three context modules into the voting and classifying stages of VoteNet to encode contextual information at different levels. Specifically, a Patch-to-Patch Context (PPC) module is employed to capture contextual information between the point patches, before voting for their corresponding object centroid points. Subsequently, an Object-to-Object Context (OOC) module is incorporated before the proposal and classification stage, to capture the contextual information between object candidates. Finally, a Global Scene Context (GSC) module is designed to learn the global scene context. We demonstrate these by capturing contextual information at patch, object and scene levels. Our method is an effective way to promote detection accuracy, achieving new state-of-the-art detection performance on challenging 3D object detection datasets, i.e., SUN RGBD and ScanNet. We also release our code at https://github.com/NUAAXQ/MLCVNet.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
| Keywords: | Three-dimensional displays, Object detection, Two dimensional displays, Feature extraction, Task analysis, Proposals, Machine learning |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
| Date Deposited: | 30 Oct 2025 12:33 |
| Last Modified: | 30 Oct 2025 12:33 |
| Published Version: | https://ieeexplore.ieee.org/document/9156370 |
| Status: | Published |
| Publisher: | IEEE |
| Identification Number: | 10.1109/cvpr42600.2020.01046 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:233704 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)