LOTUS: Continual Imitation Learning for Robot Manipulation
Through Unsupervised Skill Discovery
Weikang Wan
1, 2
Yifeng Zhu*
1
Rutav Shah*
1
Yuke Zhu
1
1
The University of Texas at Austin
2
Peking University
*: equivalent contribution
Paper
Code
Bibtex
International Conference on Robotics and Automation (ICRA), 2024
Your browser does not support the video tag.
Abstract
We introduce
LOTUS
, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks throughout its lifespan. The core idea behind
LOTUS
is constructing an ever-growing skill library from a sequence of new tasks with a small number of corresponding task demonstrations.
LOTUS
starts with a continual skill discovery process using an open-vocabulary vision model, which extracts skills as recurring patterns presented in unstructured demonstrations. Continual skill discovery updates existing skills to avoid catastrophic forgetting of previous tasks and adds new skills to exhibit novel behaviors.
LOTUS
trains a meta-controller that flexibly composes various skills to tackle vision-based manipulation tasks in the lifelong learning process. Our comprehensive experiments show that
LOTUS
outperforms state-of-the-art baselines by over 11% in average success rates, showing its superior knowledge transfer ability compared to prior methods.
Method Overview
LOTUS
consists of two components: continual skill discovery and the hierarchical policy with the skill library. For continual skill discovery, we obtain temporal segments from demonstrations using hierarchical clustering with DINOv2, and incrementally cluster the temporal segments into partitions to either update existing skills or learn new skills. For the hierarchical policy, a meta-controller π
H
selects a skill by predicting an index k and specifies the subgoals for the selected skill π
L
k
to achieve. Note that because the input to a transformer is permutation invariant, we also add sinusoidal positional encoding to input tokens in order to inform transformers of the temporal order of input tokens. We omit this information in the figure for clarity.
Real Robot Experiments
Task Visualization.
In real robot experiments, there are 25 tasks in base task stage and 5 new tasks in each lifelong task stage (Total: 5 lifelong task stages). Different tasks either have distinct language descriptions, or they have the same language description but under different scenes.
Metrics.
We report three metrics (All metrics are computed in terms of success rate):
FWT (forward transfer):
higher FWT means a policy learns faster on a new task;
NBR (negative backward trasnfer):
Lower NBT means a policy has better performance in the previously seen tasks;
AUC (area under the success rate curve):
higher AUC means an overall better performance considering both NBT and FWT;
Results Analysis.
We compare
LOTUS
with the best baseline ER (Experience Replay) on
Mutex
tasks. Our evaluation shows that
LOTUS
achieved 50 (+11) in FWT (learn much faster on new tasks), 21 (+2) in NBT (remaining competitive performance on previously learned task), and 56 (+9) in AUC in comparison to ER. The performance over the three metrics shows the efficacy of
LOTUS
policies on real robot hardware.
Skill Discovery Results
Here, we show the feature space of temporal segments in continual skill discovey using T-SNE. We use different colors to represent different skills. We also visualize three key frames for each data point.
Visualization instruction:
Drag the slider to show changes during the lifelong process, and hover over a point to display key frames for that segment.
Real World Tasks
Lifelong Learning Tasks:
Highlight Current Task
LIBERO-Object
Lifelong Learning Tasks:
Highlight Current Task
LIBERO-Goal
Lifelong Learning Tasks:
Highlight Current Task
LIBERO-50
Lifelong Learning Tasks:
Highlight Current Task