Xie, Q. orcid.org/0000-0001-9901-0396, Lai, Y.-K., Wu, J. et al. (4 more authors) (2022) VENet: Voting Enhancement Network for 3D Object Detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 10-17 Oct 2021, Montreal, QC, Canada. IEEE, pp. 3692-3701. ISBN: 978-1-6654-2813-2. ISSN: 1550-5499. EISSN: 2380-7504.
Abstract
Hough voting, as has been demonstrated in VoteNet, is effective for 3D object detection, where voting is a key step. In this paper, we propose a novel VoteNet-based 3D detector with vote enhancement to improve the detection accuracy in cluttered indoor scenes. It addresses the limitations of current voting schemes, i.e., votes from neighboring objects and background have significant negative impacts. Before voting, we replace the classic MLP with the proposed Attentive MLP (AMLP) in the backbone network to get better feature description of seed points. During voting, we design a new vote attraction loss (VALoss) to enforce vote centers to locate closely and compactly to the corresponding object centers. After voting, we then devise a vote weighting module to integrate the foreground/background prediction into the vote aggregation process to enhance the capability of the original VoteNet to handle noise from background voting. The three proposed strategies all contribute to more effective voting and improved performance, resulting in a novel 3D object detector, termed VENet. Experiments show that our method outperforms state-of-the-art methods on benchmark datasets. Ablation studies demonstrate the effectiveness of the proposed components.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
| Keywords: | Detection and localization in 2D and 3D, Scene analysis and understanding, Vision for robotics and autonomous vehicles |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
| Date Deposited: | 30 Oct 2025 11:41 |
| Last Modified: | 30 Oct 2025 11:41 |
| Published Version: | https://ieeexplore.ieee.org/document/9710665 |
| Status: | Published |
| Publisher: | IEEE |
| Identification Number: | 10.1109/iccv48922.2021.00369 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:233702 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)