Home#

Adaptive RL with Long-Term Memory

AMAGO is a high-powered off-policy version of RL^2 for training large policies on long sequences. It follows a simple and scalable perspective on RL generalization:

Turn meta-learning into a memory problem (“black-box meta-RL”).
Put all of our effort into learning effective memory with end-to-end RL.
View other RL settings as special cases of meta-RL.
Use the same method for every problem while staying customizable for research.

Some highlights:

Broadly Applicable. Long-term memory, meta-learning, multi-task RL, and zero-shot generalization are special cases of its POMDP format. Supports discrete and continuous actions. Online and offline RL. See examples below!
Scalable. Train large policies on long context sequences across multiple GPUs with parallel actors, asynchronous learning/rollouts, and large replay buffers stored on disk.
Easy to Modify. Modular and configurable. Swap in your own model architectures, RL objectives, and datasets.

Contents: