VIOLA: Imitation Learning for Vision-Based Manipulation
with Object Proposal Priors
Yifeng Zhu1 Abhishek Joshi1 Peter Stone1, 2 Yuke Zhu1
1The University of Texas at Austin 2Sony AI
Paper | Video | Code | Bibtex
6th Conference on Robot Learning, Auckland, New Zealand
We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. It uses a transformer-based policy to reason over these representations and attends to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm's robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8% in success rates. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangements and coffee making. |
Method Overview
Overview of VIOLA. We use a pre-trained RPN to get general object proposals that allow us to learn object-centric visuomotor skills. |
Real Robot Experiment
Our evaluation on real-robot tasks is shown in the following table. We show that VIOLA learns the manipulation policies with behavioral cloning algorithms much better than the state-of-the-art baseline, BC-RNN. Notably, in the Make-Coffee task, the baseline fails to complete the task in any attempt, while VIOLA is able to achieve 60%. This empirical result further proves the effectiveness of VIOLA. |
Qualitative Real Robot Demo
|
We can sequentially execute Dining-PlateFork and Dining-Bowl policies. | This video shows that the learned policies making two coffees in a row. |
Our policies are robust to scenarios where unseen distracting objects are present
(The cup and the strawberry in bowl were never present in demonstrations)
A no-cut video of 10 Make-Coffee rollouts
|
Acknowledgements |