Ma, A., Chi, G., Ivaldi, S. et al. (1 more author) (2024) Learning high-level robotic manipulation actions with visual predictive model. Complex & Intelligent Systems, 10 (1). pp. 811-823. ISSN 2199-4536
Abstract
Learning visual predictive models has great potential for real-world robot manipulations. Visual predictive models serve as a model of real-world dynamics to comprehend the interactions between the robot and objects. However, prior works in the literature have focused mainly on low-level elementary robot actions, which typically result in lengthy, inefficient, and highly complex robot manipulation. In contrast, humans usually employ top–down thinking of high-level actions rather than bottom–up stacking of low-level ones. To address this limitation, we present a novel formulation for robot manipulation that can be accomplished by pick-and-place, a commonly applied high-level robot action, through grasping. We propose a novel visual predictive model that combines an action decomposer and a video prediction network to learn the intrinsic semantic information of high-level actions. Experiments show that our model can accurately predict the object dynamics (i.e., the object movements under robot manipulation) while trained directly on observations of high-level pick-and-place actions. We also demonstrate that, together with a sampling-based planner, our model achieves a higher success rate using high-level actions on a variety of real robot manipulation tasks.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
Keywords: | Robot manipulation; Visual foresight; Visual perception; Deep learning; Grasp planning |
Dates: |
|
Institution: | The University of Leeds |
Depositing User: | Symplectic Publications |
Date Deposited: | 04 Apr 2024 10:42 |
Last Modified: | 04 Apr 2024 10:42 |
Published Version: | http://dx.doi.org/10.1007/s40747-023-01174-5 |
Status: | Published |
Publisher: | Springer Science and Business Media LLC |
Identification Number: | 10.1007/s40747-023-01174-5 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:210643 |