amago.hindsight#
Trajectory datastructures
Functions
Split a batched timestep into a list of unbatched timesteps. |
Classes
|
A finished trajectory that is ready to be used as training data. |
A no-op relabeler that returns the input trajectory unchanged. |
|
A hook for modifying trajectory data during training. |
|
|
Stores a single timestep of rollout data. |
|
A sequence of timesteps. |
- class FrozenTraj(obs, rl2s, time_idxs, rews, dones, actions)[source]#
Bases:
objectA finished trajectory that is ready to be used as training data.
- Parameters:
obs (
dict[str,ndarray]) – Dictionary of observations with shape (Batch, Length, dim_value)rl2s (
ndarray) – Tensor of meta-RL inputs (prev action, reward) with shape (Batch, Length, 1 + D_action)time_idxs (
ndarray) – Tensor of time indices with shape (Batch, Length, 1)rews (
ndarray) – Tensor of rewards with shape (Batch, Length, 1)dones (
ndarray) – Tensor of terminal signals with shape (Batch, Length, 1)actions (
ndarray) – Tensor of actions with shape (Batch, Length, D_action)
-
actions:
ndarray#
-
dones:
ndarray#
- classmethod from_dict(d)[source]#
Fold flattened observation dictionary back to original keys.
- Parameters:
d (
dict[ndarray]) – Flat dictionary with observation data prefixed with _OBS_KEY_. Typically output of FrozenTraj.to_dict.- Returns:
Object with original observation keys restored.
- Return type:
-
obs:
dict[str,ndarray]#
-
rews:
ndarray#
-
rl2s:
ndarray#
-
time_idxs:
ndarray#
- class NoOpRelabeler[source]#
Bases:
RelabelerA no-op relabeler that returns the input trajectory unchanged.
- relabel(traj)[source]#
Relabel a trajectory.
- Parameters:
traj (
Trajectory|FrozenTraj) – Trajectory or FrozenTraj object to relabel. Can be modified in place.- Returns:
New FrozenTraj object with relabeled data.
- Return type:
- class Relabeler[source]#
Bases:
ABCA hook for modifying trajectory data during training.
In the default
DiskTrajDataset, Relabeler has the chance to edit input trajectories before they are passed to an agent for training. Enables Hindsight Experience Replay (HER) and variants. See examples/13_mazerunner_relabeling.py for an implementation.- abstract relabel(traj)[source]#
Relabel a trajectory.
- Parameters:
traj (
Trajectory|FrozenTraj) – Trajectory or FrozenTraj object to relabel. Can be modified in place.- Returns:
New FrozenTraj object with relabeled data.
- Return type:
- class Timestep(obs, prev_action, reward, time_idx, terminal, batched_envs)[source]#
Bases:
objectStores a single timestep of rollout data.
Time-aligned to the input format of the policy. Agents learn from sequences of timesteps. Each timestep contains the current observation and time_idx as well as everything that has happened since the last observation was revealed (previous action, reward, terminal).
- Parameters:
obs (
dict[str,ndarray]) – Dictionary of current observation keys and values.prev_action (
ndarray) – The previous action taken by the agent.reward (
ndarray) – The reward received by the agent after it took prev_action.time_idx (
ndarray) – The integer index of the current timestep.terminal (
ndarray) – The terminal signal of the environment. True if this is the final observation.batched_envs (
int) – The number of environments in the batch. Used to disambiguate batch dimension.
- as_input()[source]#
Outputs Timestep data in the input format of the Agent.
- Returns:
- A tuple containing:
obs: Dictionary of observations with shape (batched_envs, dim_value)
rl2s: Tensor of meta-RL inputs (prev action, reward) with shape (batched_envs, 1 + D_action)
time_idx: Tensor of time indices with shape (batched_envs, 1)
- Return type:
tuple
-
batched_envs:
int#
- create_reset_version(reset_idxs)[source]#
Manually assign indices of a batched timestep to be first in a new trajectory.
Creates a new Timestep object with rewards, time_idxs, and terminal signals reset as if the environment was reset at the given reset_indices. Used for handling auto-resets in vectorized environments.
- Parameters:
reset_idxs (
ndarray) – Tensor of indices of parallel environments being reset.- Returns:
New Timestep object with reset values for specified environments.
- Return type:
-
obs:
dict[str,ndarray]#
-
prev_action:
ndarray#
-
reward:
ndarray#
-
terminal:
ndarray#
-
time_idx:
ndarray#
- class Trajectory(timesteps=typing.Optional[typing.Iterable[amago.hindsight.Timestep]])[source]#
Bases:
objectA sequence of timesteps.
Stores a rollout and handles disk saves when using the default
RLDataset.- Parameters:
timesteps – Iterable of
Timestepobjects.
- add_timestep(timestep)[source]#
Add a timestep to the trajectory.
- Parameters:
timestep (
Timestep) – Timestep object to add.- Return type:
None
- as_input_sequence()[source]#
Returns a sequence of observations, rl2s, and time_idxs.
Uses the trajectory data to gather the standard input sequences for the Agent.
- Returns:
- A tuple containing:
obs: Dictionary of observations with shape (Batch, Length, dim_value)
rl2s: Tensor of meta-RL inputs with shape (Batch, Length, 1 + D_action)
time: Tensor of time indices with shape (Batch, Length, 1)
- Return type:
tuple
- freeze()[source]#
Freeze the trajectory and return a
FrozenTrajobject.FrozenTrajsaves time by precomputing input sequences for the Agent.- Returns:
Frozen trajectory object with precomputed sequences.
- Return type:
- save_to_disk(path, save_as)[source]#
Save the trajectory to disk.
- Parameters:
path (
str) – Path to save the trajectory to.save_as (
str) – Format to save the trajectory in: - ‘trajectory’: Pickle file storing entire object (now rarely used) - ‘npz’: Standard numpy .npz file format - ‘npz-compressed’: Compressed numpy .npz file format
- Return type:
None
- property total_return: float#
Calculate the total return of this trajectory.
- Returns:
Sum of all rewards in the trajectory.
- Return type:
float