Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks

Soroush Nasiriany    Huihan Liu    Yuke Zhu   

The University of Texas at Austin   

IEEE International Conference on Robotics and Automation (ICRA), 2022

PaperCode

Realistic manipulation tasks require a robot to interact with an environment with a prolonged sequence of motor actions. While deep reinforcement learning methods have recently emerged as a promising paradigm for automating manipulation behaviors, they usually fall short in long-horizon tasks due to the exploration burden. This work introduces Manipulation Primitive-augmented reinforcement Learning (MAPLE), a learning framework that augments standard reinforcement learning algorithms with a pre-defined library of behavior primitives. These behavior primitives are robust functional modules specialized in achieving manipulation goals, such as grasping and pushing. To use these heterogeneous primitives, we develop a hierarchical policy that involves the primitives and instantiates their executions with input parameters. We demonstrate that MAPLE outperforms baseline approaches by a significant margin on a suite of simulated manipulation tasks. We also quantify the compositional structure of the learned behaviors and highlight our method's ability to transfer policies to new task variants and to physical hardware.


Method Overview

Overview of MAPLE. (left) We present a learning framework that augments the robot's atomic motor actions with a library of versatile behavior primitives. (middle) Our method learns to compose these primitives via reinforcement learning. (right) This enables the agent to solve complex long-horizon manipulation tasks.



Integrating Heterogeneous Primitives with a Hierarchical Policy

Our goal is to incorporate a heterogeneous set of primitives that take input parameters of different dimensions, operate at variable temporal lengths, and produce distinct behaviors. To that end we adopt a hierarchical policy, where at the high level a task policy determines the primitive type and at the low level a parameter policy determines the corresponding primitive parameters. Our hierarchical design facilitates modular reasoning, delegating the high-level to focus on which primitive to execute and the low-level to focus on how to instantiate that primitive.



Simulated Environment Evaluation

We perform evaluations on eight manipulation tasks. The first six come from the robosuite benchmark, and we designed the last two (cleanup, peg insertion) to test our method in multi-stage, contact-rich tasks.


We compare MAPLE to five baselines, including a method that only employs low-level motor actions (Atomic), a state-of-the-art hierarchical DRL method that learns low-level options along with high-level controllers (DAC), methods that employ alternative policy modeling designs (Flat and Open Loop), and a self-baseline of our method that excludes low-level motor actions (MAPLE (Non-Atomic)). We find that MAPLE significantly outperforms these baselines.



For reference, we visualize sample rollouts on the peg insertion task across all baselines:



Model Analysis

We present an analysis of the task sketches that our method learned for each task. Each row corresponds to a single sketch progressing temporally from left to right. We see evidence that the agent unveils compositional task structures by applying temporally extended primitives whenever appropriate and relying on atomic actions otherwise. For example, for the peg insertion task the agent leverages the grasping primitive to pick up the peg and the reaching primitive to align the peg with the hole in the block, but then it uses atomic actions for the contact-rich insertion phase.

(click image to view full resolution)



Real-World Evaluation

As our behavior primitives offer high-level action abstractions and encapsulate low-level complexities of motor actuation, our policies can directly transfer to the real world. We trained MAPLE on simulated versions of the stack and cleanup tasks and executed the resulting policies to the real world. Here we show rollouts on the cleanup task (played at 5x).




Citation


    @inproceedings{nasiriany2022maple,
      title={Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks},
      author={Soroush Nasiriany and Huihan Liu and Yuke Zhu},
      booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
      year={2022}
    }