Home

Home#

amago logo

Adaptive RL with Long-Term Memory



AMAGO is a high-powered off-policy version of RL^2 for training large policies on long sequences. It follows a simple and scalable perspective on RL generalization:

  1. Turn meta-learning into a memory problem (“black-box meta-RL”).

  2. Put all of our effort into learning effective memory with end-to-end RL.

  3. View other RL settings as special cases of meta-RL.

  4. Use the same method for every problem while staying customizable for research.

Some highlights:

  • Broadly Applicable. Long-term memory, meta-learning, multi-task RL, and zero-shot generalization are special cases of its POMDP format. Supports discrete and continuous actions. Online and offline RL. See examples below!

  • Scalable. Train large policies on long context sequences across multiple GPUs with parallel actors, asynchronous learning/rollouts, and large replay buffers stored on disk.

  • Easy to Modify. Modular and configurable. Swap in your own model architectures, RL objectives, and datasets.