Deep Imitation Learning for Humanoid Loco-manipulation
through Human Teleoperation
We tackle the problem of developing humanoid loco-manipulation skills with deep imitation learning. The challenge of collecting human demonstrations for humanoids, in conjunction with the difficulty of policy training under a high degree of freedom, presents substantial challenges. We introduce TRILL, a data-efficient framework for learning humanoid loco-manipulation policies from human demonstrations. In this framework, we collect human demonstration data through an intuitive Virtual Reality (VR) interface. We employ the whole-body control formulation to transform task-space commands from human operators into the robot's joint-torque actuation while stabilizing its dynamics. By employing high-level action abstractions tailored for humanoid robots, our method can efficiently learn complex loco-manipulation skills. We demonstrate the effectiveness of TRILL in simulation and on a real-world robot for performing various types of tasks.
TRILL addresses the challenge of learning humanoid loco-manipulation. We introduce a learning framework that facilitates teleoperated demonstrations with task-space commands provided by a human demonstrator. The trained policies leverage human complexity and adaptability in decision-making to generate these commands. The robot control interface then executes these target commands through joint-torque actuation, complying with robot dynamics. This synergistic combination of imitation learning and whole-body control enables successful method implementation in both simulated and real-world environments.
Hierarchical Loco-manipulation Pipeline
The trained policies generate the target task-space command at 20Hz from the onboard stereo camera observation and the robot's proprioceptive feedback. The robot control interface realizes the task-space commands and computes the desired joint torques at 100Hz and sends them to the humanoid robot for actuation.
We design an intuitive VR teleoperation system, which reduces the cognitive and physical burdens for human operators to provide task demonstration. As a result, our teleoperation approach can produce high-quality demonstration data while maintaining safe robot operation.
We demonstrate the application of TRILL on the real robot, deploying visuomotor policies trained for dexterous manipulation tasks. During evaluation, the robot performed each task 10 times in a row without rebooting and succeeded in 8 out of 10 trials in the Tool pick-and-place task and 9 out of 10 trials in the Removing the spray cap task, respectively.
We design two realistic simulation environments and evaluate the robot’s ability to successfully perform subtasks involving free-space locomotion, manipulation, and loco-manipulation. TRILL, a framework tailored to train humanoid robots, achieves success rates of 96% for free-space locomotion tasks, 80% for manipulation tasks, and 92% for loco-manipulation tasks.