Fan, Z., Huang, Z., Chen, Z. et al. (3 more authors) (2024) Lightweight multiperson pose estimation with staggered alignment self-distillation. IEEE Transactions on Multimedia, 26. pp. 9228-9240. ISSN 1520-9210
Abstract
Accurate 2D human pose estimation from images is vital for understanding human actions. However, deploying the latest models, e.g., regression-based models, on resource-limited devices remains challenging due to their high computational requirements. In this paper, we address the resolution dilemma in regression-based multiperson pose estimation, where low-resolution inputs cause performance degradation, while high-resolution inputs drastically increase computational costs. To achieve a lightweight regression approach, it becomes crucial to enhance the model's capabilities in low-resolution scenarios. We propose the staggered alignment self-distillation (SASD) method and a corresponding network architecture. Our approach involves training two twin networks with shared weights: a high-resolution network and a low-resolution network. The high-resolution network serves as a teacher, guiding the learning process of the low-resolution network through feature map staggered alignment. The knowledge from the high-resolution network enhances the performance of the low-resolution network during low-resolution inference. Additionally, we employ a normalized skeleton loss to capture the loss of bone-related structure during training. Through extensive experiments on the MS-COCO and CrowdPose datasets, we demonstrate the superiority of our proposed method over state-of-the-art, lightweight multiperson pose estimation techniques, achieving much better performance with lower computational costs. Furthermore, our method achieves comparable performance to recent advanced regression-based pose estimation methods but with only 1/4 of the computational cost.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 The Authors. Except as otherwise noted, this author-accepted version of a journal article published in IEEE Transactions on Multimedia is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | 2D pose estimation; lightweight neural networks |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 15 Apr 2024 15:05 |
Last Modified: | 15 Nov 2024 16:06 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers |
Refereed: | Yes |
Identification Number: | 10.1109/TMM.2024.3387754 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:211447 |