LOTUS

Abstract

We introduce LOTUS, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks throughout its lifespan. The core idea behind LOTUS is constructing an ever-growing skill library from a sequence of new tasks with a small number of corresponding task demonstrations. LOTUS starts with a continual skill discovery process using an open-vocabulary vision model, which extracts skills as recurring patterns presented in unstructured demonstrations. Continual skill discovery updates existing skills to avoid catastrophic forgetting of previous tasks and adds new skills to exhibit novel behaviors. LOTUS trains a meta-controller that flexibly composes various skills to tackle vision-based manipulation tasks in the lifelong learning process. Our comprehensive experiments show that LOTUS outperforms state-of-the-art baselines by over 11% in average success rates, showing its superior knowledge transfer ability compared to prior methods.

Method Overview

LOTUS consists of two components: continual skill discovery and the hierarchical policy with the skill library. For continual skill discovery, we obtain temporal segments from demonstrations using hierarchical clustering with DINOv2, and incrementally cluster the temporal segments into partitions to either update existing skills or learn new skills. For the hierarchical policy, a meta-controller π^H selects a skill by predicting an index k and specifies the subgoals for the selected skill π^L_k to achieve. Note that because the input to a transformer is permutation invariant, we also add sinusoidal positional encoding to input tokens in order to inform transformers of the temporal order of input tokens. We omit this information in the figure for clarity.

Real Robot Experiments

Task Visualization. In real robot experiments, there are 25 tasks in base task stage and 5 new tasks in each lifelong task stage (Total: 5 lifelong task stages). Different tasks either have distinct language descriptions, or they have the same language description but under different scenes.

Metrics. We report three metrics (All metrics are computed in terms of success rate):

FWT (forward transfer): higher FWT means a policy learns faster on a new task;
NBR (negative backward trasnfer): Lower NBT means a policy has better performance in the previously seen tasks;
AUC (area under the success rate curve): higher AUC means an overall better performance considering both NBT and FWT;

Results Analysis. We compare LOTUS with the best baseline ER (Experience Replay) on Mutex tasks. Our evaluation shows that LOTUS achieved 50 (+11) in FWT (learn much faster on new tasks), 21 (+2) in NBT (remaining competitive performance on previously learned task), and 56 (+9) in AUC in comparison to ER. The performance over the three metrics shows the efficacy of LOTUS policies on real robot hardware.

Skill Discovery Results

Here, we show the feature space of temporal segments in continual skill discovey using T-SNE. We use different colors to represent different skills. We also visualize three key frames for each data point. Visualization instruction: Drag the slider to show changes during the lifelong process, and hover over a point to display key frames for that segment.

Real World Tasks

Lifelong Learning Tasks:

Highlight Current Task

LIBERO-Object

Lifelong Learning Tasks:

Highlight Current Task

LIBERO-Goal

Lifelong Learning Tasks:

Highlight Current Task

LIBERO-50

Lifelong Learning Tasks:

Highlight Current Task

LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill Discovery

Abstract

Method Overview

Real Robot Experiments

Skill Discovery Results

Real World Tasks

LIBERO-Object

LIBERO-Goal

LIBERO-50

LOTUS: Continual Imitation Learning for Robot Manipulation
Through Unsupervised Skill Discovery