Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation

Yifeng Zhu1    Peter Stone1, 2    Yuke Zhu1   

1The University of Texas at Austin    2Sony AI   

Paper | Code | Bibtex

in IEEE Robotics and Automation Letters, 2022

We tackle real-world long-horizon robot manipulation tasks through skill discovery. We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations and use these skills to synthesize prolonged robot behaviors. Our method starts with constructing a hierarchical task structure from each demonstration through agglomerative clustering. From the task structures of multi-task demonstrations, we identify skills based on the recurring patterns and train goal-conditioned sensorimotor policies with hierarchical imitation learning. Finally, we train a meta controller to compose these skills to solve long-horizon manipulation tasks. The entire model can be trained on a small set of human demonstrations collected within 30 minutes without further annotations, making it amendable to real-world deployment. We systematically evaluated our method in simulation environments and on a real robot. Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks. Furthermore, skills discovered from multi-task demonstrations boost the average task success by 8% compared to those discovered from individual tasks.


Method Overview

Overview of BUDS. We construct hierarchical task structures of demonstration sequences in a bottom-up manner, from which we obtain temporal segments for discovering and learning sensorimotor skills.




Hierarchical Policy Model

Overview of the hierarchical policy. Given a workspace observation, the meta controller selects the skill index and generates the latent subgoal vector. Then the selected sensorimotor skill generates action conditioned on observed images, proprioception, and the subgoal vector.



Simulation Experiment

We show a qualitative comparison between two baselines (vanilla BC and Changepoint Detction) and our method BUDS on Kitchen, achieving 24.4%, 23.4%, and 72.0% task success rate respectively. BUDS learns skills that lead to better execution, and we've shown the quantitative result in Table 1 of the paper.

BC Baseline [1] CP Baseline [2] BUDS (Ours)

[1] T. Zhang et al. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation.
[2] S. Niekum et al. Online bayesianchangepoint detection for articulated motion models.


Real Robot Experiment

We tested BUDS on the Real-Kitchen task using 7DoF Franka Emika Panda arm. We collected 50 demonstration instances for training the policy, and the task success rate is 58% out of 50 evaluation trials.



Multi-task Domain

Task Variants Descriptions

(click to view full resolution)

Screenshots of example initial configurations for all task variants in Multitask-Kitchen. Each row corresponds to a task, and left two figures in each row represent the two variants that are covered in Train (Multi), Train (Single). Every variant shown in the right figure of each row is covered in Test.

The stove on, the object and the pot on the table, the drawer open
Task 1 Variant-1
The stove on, the object and the pot on the table, the drawer closed.
Task 1 Variant-2
The stove off, the object and the pot on the table, the drawer closed
Task 1 Variant-3 (Test)
The stove off, the object and the pot on the table, the drawer closed
Task 2 Variant-1
The stove on, the object and the pot on the table, the drawer closed
Task 2 Variant-2
The stove on, the object and the pot on the table, the drawer open
Task 2 Variant-3 (Test)
The stove off, the pot on the table, the object in the pot, the drawer open
Task 3 Variant-1
The stove off, the object and the pot on the table, the drawer closed
Task 3 Variant-2
The stove off, the object and the pot on the table, the drawer open
Task 3 Variant-3 (Test)

Multi-task Result

In the Multitask-Kitchen domain, we examine the skills from two aspects: 1) quality: Are skills learned from multi-task demonstrations better than those from individual tasks? 2)reusability: Can the skills be composed to solve new task variants that require different subtask combinations?

The result is shown in the following table. The comparison between Train (Multi) and Train (Single) indicate that skills learned across multi-task demonstrations improve the average task performance by 8% compared to those learned from demonstrations of individual tasks; The result on Test show that we can effectively reuse the skills to solve the new task variants that require different combinations of the skills by solely training a new meta controller to invoke the learned skills from Train (Multi).

Train (Multi) Train (Single) Test
Task-1 70.2%±2.2% 52.6%±5.6% 59.0%±6.4%
Task-2 59.8%±6.4% 60.8%±1.9% 55.3%±3.3%
Task-3 75.0%±2.0% 67.6%±1.8% 28.4%±1.5%


Acknowledgements

This work has taken place in the Robot Perception and Learning Group (RPL) and Learning Agents Research Group (LARG) at UT Austin. RPL research has been partially supported by NSF CNS-1955523, the MLL Research Award from the Machine Learning Laboratory at UT-Austin, and the Amazon Research Awards. LARG research is supported in part by NSF (CPS-1739964, IIS-1724157, NRI-1925082), ONR (N00014-18-2243), FLI (RFP2-000), ARO (W911NF-19-2-0333), DARPA, Lockheed Martin, GM, and Bosch. Peter Stone serves as the Executive Director of Sony AI America and receives financial compensation for this work. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research.