amago.envs.exploration#
Exploration via gymnasium wrappers.
Classes
|
Implements the bi-level epsilon greedy exploration strategy visualized in Figure 13 of the AMAGO paper. .. tip::. |
|
Standard exploration noise schedule. |
|
Abstract base class for exploration wrappers. |
- class BilevelEpsilonGreedy(amago_env, rollout_horizon=gin.REQUIRED, eps_start_start=1.0, eps_start_end=0.05, eps_end_start=0.8, eps_end_end=0.01, steps_anneal=1000000, randomize_eps=True)[source]#
Bases:
ExplorationWrapper
Implements the bi-level epsilon greedy exploration strategy visualized in Figure 13 of the AMAGO paper. .. tip:
This class is ``@gin.configurable``. Default values of kwargs can be overridden using `gin <https://github.com/google/gin-config>`_.
Exploration noise decays both over the course of training and throughout each rollout. This more closely resembles the online exploration/exploitation strategy of a meta-RL agent in a new environment.
- Parameters:
amago_env – The environment to wrap.
rollout_horizon (
int
) – The expected maximum length of each rollout.gin.REQUIRED
, meaning it must be configured manually via gin on a case-by-case basis, e.g. gin.bind_parameter(“BilevelEpsilonGreedy.rollout_horizon”, 100)eps_start_start (
float
) – Exploration noise at the start of training and start of a rollout. Default is 1.0.eps_start_end (
float
) – Exploration noise at the end of training and start of a rollout. Default is 0.05.eps_end_start (
float
) – Exploration noise at the start of training and end of a rollout. Default is 0.8.eps_end_end (
float
) – Exploration noise at the end of training and end of a rollout. Default is 0.01.steps_anneal (
int
) – Linear schedule end point (in terms of steps taken in each actor process). Default is 1,000,000.randomize_eps (
bool
) – Treat the schedule as the max and sample uniform [0, max) for each actor. This tends to reduce the need to tune steps_anneal. Default is True.
- add_exploration_noise(action, local_step)[source]#
Add exploration noise to the action.
- Parameters:
action (
ndarray
) – The action provided by the Agent.local_step (
ndarray
) – The number of timesteps since the last episode reset.
- Returns:
The action to be taken in the environment.
- class EpsilonGreedy(amago_env, eps_start=1.0, eps_end=0.05, steps_anneal=1000000, randomize_eps=True)[source]#
Bases:
BilevelEpsilonGreedy
Standard exploration noise schedule.
Tip
This class is
@gin.configurable
. Default values of kwargs can be overridden using gin.Classic epsilon-greedy exploration in discrete action spaces. TD3-style (action + epsilon * noise).clip(-1, 1) in continuous action spaces.
- Parameters:
amago_env – The environment to wrap.
eps_start (
float
) – Exploration noise at the start of training. Default is 1.0.eps_end (
float
) – Exploration noise at the end of training. Default is 0.05.steps_anneal (
int
) – Linear schedule end point (in terms of steps taken in a single actor process). Default is 1,000,000.randomize_eps (
bool
) – Treat the schedule as the max and sample uniform [0, max) for each parallel actor. This tends to reduce the need to tune steps_anneal. Default is True.
- class ExplorationWrapper(amago_env)[source]#
Bases:
ABC
,ActionWrapper
Abstract base class for exploration wrappers.
- Parameters:
amago_env – The environment to wrap.
- action(a)[source]#
Returns a modified action before
env.step()
is called.- Parameters:
action – The original
step()
actions- Return type:
ndarray
- Returns:
The modified actions
- abstract add_exploration_noise(action, local_step)[source]#
Add exploration noise to the action.
- Parameters:
action (
ndarray
) – The action provided by the Agent.local_step (
int
) – The number of timesteps since the last episode reset.
- Return type:
ndarray
- Returns:
The action to be taken in the environment.
- property return_history#
- property success_history#