PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-efficient Imitation Learning
Tian Gao1 Soroush Nasiriany2 Huihan Liu 2 Quantao Yang3 Yuke Zhu2
1Stanford University 2The University of Texas at Austin 3KTH Royal Institute of Technology
Paper
Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10-34% higher success rates in simulation over state-of-the-art baselines and 20-48% on physical hardware. |
PRIME: Overview
We present a data-efficient imitation learning framework that scaffolds manipulation tasks with behavior primitives, breaking down long human demonstrations into concise, simple behavior primitive sequences. Given task demonstrations, we utilize a trajectory parser to parse each demonstration into a sequence of primitive types and their corresponding parameters. Subsequently, we use imitation learning to train a policy capable of predicting primitive types and corresponding parameters based on observations. |
Framework Overview
We develop a self-supervised data generation strategy that randomly executes sequences of behavior primitives in the environment. With the generated dataset, we train an inverse dynamics model (IDM) that maps initial states and final states from segments in task demonstrations to primitive types and corresponding parameters. To derive the optimal primitive sequences, we build a trajectory parser capable of parsing task demonstrations into primitive sequences using dynamic programming. Finally, we train the policy using parsed primitive sequences. |
Experiments in Simulation
We perform evaluations on three tasks from the robosuite simulator. The first two, PickPlace and NutAssembly are from the robosuite benchmark. We introduce a third task, TidyUp to study long-horizon tasks. |
Our method significantly outperforms all baselines, achieving success rates exceeding 95% across all tasks with remarkable robustness. This showcases our method's effectiveness in achieving data-efficient imitation learning through the decomposition of task demonstrations into concise primitive sequences to simplify task complexity. |
Real-World Evaluation
We evaluate the performance of PRIME against an imitation learning baseline (BC-RNN) on two real-world CleanUp task variants: CleanUp-Bin and CleanUp-Stack. |
Our method significantly outperforms BC-RNN in two real-world tabletop tasks. Here we show rollouts in the two real-world tasks (played at 8x): |
CleanUp-Bin |
CleanUp-Stack |
Visualization of segmented primitive sequences
For each task, we select five human demonstrations and visualize the segmented primitive sequences as interpreted by the trajectory parser. |
Citation
|