Deep Imitation Learning for Humanoid Loco-manipulation
through Human Teleoperation

Mingyo Seo    Steve Han    Kyutae Sim    Seung Hyeon Bang    Carlos Gonzalez    Luis Sentis    Yuke Zhu

The University of Texas at Austin   

IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2023
Oral Presentation

Paper | Code

We tackle the problem of developing humanoid loco-manipulation skills with deep imitation learning. The challenge of collecting human demonstrations for humanoids, in conjunction with the difficulty of policy training under a high degree of freedom, presents substantial challenges. We introduce TRILL, a data-efficient framework for learning humanoid loco-manipulation policies from human demonstrations. In this framework, we collect human demonstration data through an intuitive Virtual Reality (VR) interface. We employ the whole-body control formulation to transform task-space commands from human operators into the robot's joint-torque actuation while stabilizing its dynamics. By employing high-level action abstractions tailored for humanoid robots, our method can efficiently learn complex loco-manipulation skills. We demonstrate the effectiveness of TRILL in simulation and on a real-world robot for performing various types of tasks.

Method Overview

TRILL addresses the challenge of learning humanoid loco-manipulation. We introduce a learning framework that facilitates teleoperated demonstrations with task-space commands provided by a human demonstrator. The trained policies leverage human complexity and adaptability in decision-making to generate these commands. The robot control interface then executes these target commands through joint-torque actuation, complying with robot dynamics. This synergistic combination of imitation learning and whole-body control enables successful method implementation in both simulated and real-world environments.

Hierarchical Loco-manipulation Pipeline

The trained policies generate the target task-space command at 20Hz from the onboard stereo camera observation and the robot's proprioceptive feedback. The robot control interface realizes the task-space commands and computes the desired joint torques at 100Hz and sends them to the humanoid robot for actuation. More implementation details can be found in this page.

Real-robot Teleoperation

We design an intuitive VR teleoperation system, which reduces the cognitive and physical burdens for human operators to provide task demonstration. As a result, our teleoperation approach can produce high-quality demonstration data while maintaining safe robot operation.

Music: Happy by Luke Bergs

Real-robot Deployment

We demonstrate the application of TRILL on the real robot, deploying visuomotor policies trained for dexterous manipulation tasks. During evaluation, the robot performed each task 10 times in a row without rebooting and succeeded in 8 out of 10 trials in the Tool pick-and-place task and 9 out of 10 trials in the Removing the spray cap task, respectively.

Simulation Evaluation

We design two realistic simulation environments and evaluate the robot’s ability to successfully perform subtasks involving free-space locomotion, manipulation, and loco-manipulation. TRILL, a framework tailored to train humanoid robots, achieves success rates of 96% for free-space locomotion tasks, 80% for manipulation tasks, and 92% for loco-manipulation tasks.


        title={Deep Imitation Learning for Humanoid Loco-manipulation 
	  through Human Teleoperation},
        author={Seo, Mingyo and Han, Steve and Sim, Kyutae and Bang, Seung Hyeon
	  and Gonzalez, Carlos and Sentis, Luis and Zhu, Yuke},
        booktitle={IEEE-RAS International Conference on Humanoid Robots (Humanoids)},