Dang, B., Wu, L., Yang, X. et al. (2 more authors) (2026) SegMo: Segment-aligned text to 3D human motion generation. In: 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 06-10 Mar 2026, Tucson, Arizona, USA. . Institute of Electrical and Electronics Engineers (IEEE). ISBN: 9798331555122. ISSN: 2472-6737. EISSN: 2642-9381.
Abstract
Generating 3D human motions from textual descriptions is an important research problem with broad applications in video games, virtual reality, and augmented reality. Recent methods align the textual description with human motion at the sequence level, neglecting the internal semantic structure of modalities. However, both motion descriptions and motion sequences can be naturally decomposed into smaller and semantically coherent segments, which can serve as atomic alignment units to achieve finer-grained correspondence. Motivated by this, we propose SegMo, a novel Segment-aligned text-conditioned human Motion generation framework to achieve fine-grained text–motion alignment. Our framework consists of three modules: (1) Text Segment Extraction, which decomposes complex textual descriptions into temporally ordered phrases, each representing a simple atomic action; (2) Motion Segment Extraction, which partitions complete motion sequences into corresponding motion segments; and (3) Fine-grained Text–Motion Alignment, which aligns text and motion segments with contrastive learning. Extensive experiments demonstrate that SegMo improves the strong baseline on two widely used datasets, achieving an improved TOP 1 score of 0.553 on the HumanML3D test set. Moreover, thanks to the learned shared embedding space for text and motion segments, SegMo can also be applied to retrieval-style tasks such as motion grounding and motion-to-text retrieval.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2026 The Authors. Except as otherwise noted, this author-accepted version of a conference paper published in 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
| Keywords: | 3d human motion generation |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
| Date Deposited: | 11 Feb 2026 08:17 |
| Last Modified: | 13 May 2026 14:31 |
| Status: | Published |
| Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
| Refereed: | Yes |
| Identification Number: | 10.1109/WACV61042.2026.00671 |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:236953 |
Download
Filename: WACV_2026 (1).pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)