PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-efficient Imitation Learning

Tian Gao1    Soroush Nasiriany2    Huihan Liu 2    Quantao Yang3    Yuke Zhu2   

1Stanford University    2The University of Texas at Austin    3KTH Royal Institute of Technology

Paper

Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10-34% higher success rates in simulation over state-of-the-art baselines and 20-48% on physical hardware.


PRIME: Overview

We present a data-efficient imitation learning framework that scaffolds manipulation tasks with behavior primitives, breaking down long human demonstrations into concise, simple behavior primitive sequences. Given task demonstrations, we utilize a trajectory parser to parse each demonstration into a sequence of primitive types and their corresponding parameters. Subsequently, we use imitation learning to train a policy capable of predicting primitive types and corresponding parameters based on observations.



Framework Overview


We develop a self-supervised data generation strategy that randomly executes sequences of behavior primitives in the environment. With the generated dataset, we train an inverse dynamics model (IDM) that maps initial states and final states from segments in task demonstrations to primitive types and corresponding parameters. To derive the optimal primitive sequences, we build a trajectory parser capable of parsing task demonstrations into primitive sequences using dynamic programming. Finally, we train the policy using parsed primitive sequences.



Experiments in Simulation

We perform evaluations on three tasks from the robosuite simulator. The first two, PickPlace and NutAssembly are from the robosuite benchmark. We introduce a third task, TidyUp to study long-horizon tasks.



Our method significantly outperforms all baselines, achieving success rates exceeding 95% across all tasks with remarkable robustness. This showcases our method's effectiveness in achieving data-efficient imitation learning through the decomposition of task demonstrations into concise primitive sequences to simplify task complexity.




Real-World Evaluation

We evaluate the performance of PRIME against an imitation learning baseline (BC-RNN) on two real-world CleanUp task variants: CleanUp-Bin and CleanUp-Stack.

Our method significantly outperforms BC-RNN in two real-world tabletop tasks. Here we show rollouts in the two real-world tasks (played at 8x):

CleanUp-Bin

CleanUp-Stack



Visualization of segmented primitive sequences

For each task, we select five human demonstrations and visualize the segmented primitive sequences as interpreted by the trajectory parser.





Citation


    @article{gao2024prime,
	  title={PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning},
	  author={Gao, Tian and Nasiriany, Soroush and Liu, Huihan and Yang, Quantao and Zhu, Yuke},
	  journal={arXiv preprint arXiv:2403.00929},
	  year={2024}
	}