amago.hindsight#
Trajectory datastructures
Functions
Split a batched timestep into a list of unbatched timesteps. |
Classes
|
A finished trajectory that is ready to be used as training data. |
A no-op relabeler that returns the input trajectory unchanged. |
|
A hook for modifying trajectory data during training. |
|
|
Stores a single timestep of rollout data. |
|
A sequence of timesteps. |
- class FrozenTraj(obs, rl2s, time_idxs, rews, dones, actions)[source]#
Bases:
object
A finished trajectory that is ready to be used as training data.
- Parameters:
obs (
dict
[str
,ndarray
]) – Dictionary of observations with shape (Batch, Length, dim_value)rl2s (
ndarray
) – Tensor of meta-RL inputs (prev action, reward) with shape (Batch, Length, 1 + D_action)time_idxs (
ndarray
) – Tensor of time indices with shape (Batch, Length, 1)rews (
ndarray
) – Tensor of rewards with shape (Batch, Length, 1)dones (
ndarray
) – Tensor of terminal signals with shape (Batch, Length, 1)actions (
ndarray
) – Tensor of actions with shape (Batch, Length, D_action)
-
actions:
ndarray
#
-
dones:
ndarray
#
- classmethod from_dict(d)[source]#
Fold flattened observation dictionary back to original keys.
- Parameters:
d (
dict
[ndarray
]) – Flat dictionary with observation data prefixed with _OBS_KEY_. Typically output of FrozenTraj.to_dict.- Returns:
Object with original observation keys restored.
- Return type:
-
obs:
dict
[str
,ndarray
]#
-
rews:
ndarray
#
-
rl2s:
ndarray
#
-
time_idxs:
ndarray
#
- class NoOpRelabeler[source]#
Bases:
Relabeler
A no-op relabeler that returns the input trajectory unchanged.
- relabel(traj)[source]#
Relabel a trajectory.
- Parameters:
traj (
Trajectory
|FrozenTraj
) – Trajectory or FrozenTraj object to relabel. Can be modified in place.- Returns:
New FrozenTraj object with relabeled data.
- Return type:
- class Relabeler[source]#
Bases:
ABC
A hook for modifying trajectory data during training.
In the default
DiskTrajDataset
, Relabeler has the chance to edit input trajectories before they are passed to an agent for training. Enables Hindsight Experience Replay (HER) and variants. See examples/13_mazerunner_relabeling.py for an implementation.- abstract relabel(traj)[source]#
Relabel a trajectory.
- Parameters:
traj (
Trajectory
|FrozenTraj
) – Trajectory or FrozenTraj object to relabel. Can be modified in place.- Returns:
New FrozenTraj object with relabeled data.
- Return type:
- class Timestep(obs, prev_action, reward, time_idx, terminal, batched_envs)[source]#
Bases:
object
Stores a single timestep of rollout data.
Time-aligned to the input format of the policy. Agents learn from sequences of timesteps. Each timestep contains the current observation and time_idx as well as everything that has happened since the last observation was revealed (previous action, reward, terminal).
- Parameters:
obs (
dict
[str
,ndarray
]) – Dictionary of current observation keys and values.prev_action (
ndarray
) – The previous action taken by the agent.reward (
ndarray
) – The reward received by the agent after it took prev_action.time_idx (
ndarray
) – The integer index of the current timestep.terminal (
ndarray
) – The terminal signal of the environment. True if this is the final observation.batched_envs (
int
) – The number of environments in the batch. Used to disambiguate batch dimension.
- as_input()[source]#
Outputs Timestep data in the input format of the Agent.
- Returns:
- A tuple containing:
obs: Dictionary of observations with shape (batched_envs, dim_value)
rl2s: Tensor of meta-RL inputs (prev action, reward) with shape (batched_envs, 1 + D_action)
time_idx: Tensor of time indices with shape (batched_envs, 1)
- Return type:
tuple
-
batched_envs:
int
#
- create_reset_version(reset_idxs)[source]#
Manually assign indices of a batched timestep to be first in a new trajectory.
Creates a new Timestep object with rewards, time_idxs, and terminal signals reset as if the environment was reset at the given reset_indices. Used for handling auto-resets in vectorized environments.
- Parameters:
reset_idxs (
ndarray
) – Tensor of indices of parallel environments being reset.- Returns:
New Timestep object with reset values for specified environments.
- Return type:
-
obs:
dict
[str
,ndarray
]#
-
prev_action:
ndarray
#
-
reward:
ndarray
#
-
terminal:
ndarray
#
-
time_idx:
ndarray
#
- class Trajectory(timesteps=typing.Optional[typing.Iterable[amago.hindsight.Timestep]])[source]#
Bases:
object
A sequence of timesteps.
Stores a rollout and handles disk saves when using the default
RLDataset
.- Parameters:
timesteps – Iterable of
Timestep
objects.
- add_timestep(timestep)[source]#
Add a timestep to the trajectory.
- Parameters:
timestep (
Timestep
) – Timestep object to add.- Return type:
None
- as_input_sequence()[source]#
Returns a sequence of observations, rl2s, and time_idxs.
Uses the trajectory data to gather the standard input sequences for the Agent.
- Returns:
- A tuple containing:
obs: Dictionary of observations with shape (Batch, Length, dim_value)
rl2s: Tensor of meta-RL inputs with shape (Batch, Length, 1 + D_action)
time: Tensor of time indices with shape (Batch, Length, 1)
- Return type:
tuple
- freeze()[source]#
Freeze the trajectory and return a
FrozenTraj
object.FrozenTraj
saves time by precomputing input sequences for the Agent.- Returns:
Frozen trajectory object with precomputed sequences.
- Return type:
- save_to_disk(path, save_as)[source]#
Save the trajectory to disk.
- Parameters:
path (
str
) – Path to save the trajectory to.save_as (
str
) – Format to save the trajectory in: - ‘trajectory’: Pickle file storing entire object (now rarely used) - ‘npz’: Standard numpy .npz file format - ‘npz-compressed’: Compressed numpy .npz file format
- Return type:
None
- property total_return: float#
Calculate the total return of this trajectory.
- Returns:
Sum of all rewards in the trajectory.
- Return type:
float