amago.hindsight

amago.hindsight#

Trajectory datastructures

Functions

split_batched_timestep(t)

Split a batched timestep into a list of unbatched timesteps.

Classes

`FrozenTraj`(obs, rl2s, time_idxs, rews, ...)	A finished trajectory that is ready to be used as training data.
`NoOpRelabeler`()	A no-op relabeler that returns the input trajectory unchanged.
`Relabeler`()	A hook for modifying trajectory data during training.
`Timestep`(obs, prev_action, reward, time_idx, ...)	Stores a single timestep of rollout data.
`Trajectory`([timesteps])	A sequence of timesteps.

class FrozenTraj(obs, rl2s, time_idxs, rews, dones, actions)[source]#

Bases: object

A finished trajectory that is ready to be used as training data.

Parameters:

obs (dict[str, ndarray]) – Dictionary of observations with shape (Batch, Length, dim_value)
rl2s (ndarray) – Tensor of meta-RL inputs (prev action, reward) with shape (Batch, Length, 1 + D_action)
time_idxs (ndarray) – Tensor of time indices with shape (Batch, Length, 1)
rews (ndarray) – Tensor of rewards with shape (Batch, Length, 1)
dones (ndarray) – Tensor of terminal signals with shape (Batch, Length, 1)
actions (ndarray) – Tensor of actions with shape (Batch, Length, D_action)

actions: ndarray#

dones: ndarray#

classmethod from_dict(d)[source]#

Fold flattened observation dictionary back to original keys.

Parameters:: d (dict[ndarray]) – Flat dictionary with observation data prefixed with _OBS_KEY_. Typically output of FrozenTraj.to_dict.
Returns:: Object with original observation keys restored.
Return type:: FrozenTraj

obs: dict[str, ndarray]#

rews: ndarray#

rl2s: ndarray#

time_idxs: ndarray#

to_dict()[source]#

Get all trajectory data as an easily serializable dictionary.

Returns:

Flat dictionary with observation data prefixed with _OBS_KEY_ for: format restoration.

Return type:

dict

class NoOpRelabeler[source]#

Bases: Relabeler

A no-op relabeler that returns the input trajectory unchanged.

relabel(traj)[source]#

Relabel a trajectory.

Parameters:: traj (Trajectory | FrozenTraj) – Trajectory or FrozenTraj object to relabel. Can be modified in place.
Returns:: New FrozenTraj object with relabeled data.
Return type:: FrozenTraj

class Relabeler[source]#

Bases: ABC

A hook for modifying trajectory data during training.

In the default DiskTrajDataset, Relabeler has the chance to edit input trajectories before they are passed to an agent for training. Enables Hindsight Experience Replay (HER) and variants. See examples/13_mazerunner_relabeling.py for an implementation.

abstract relabel(traj)[source]#

Relabel a trajectory.

Parameters:: traj (Trajectory | FrozenTraj) – Trajectory or FrozenTraj object to relabel. Can be modified in place.
Returns:: New FrozenTraj object with relabeled data.
Return type:: FrozenTraj

class Timestep(obs, prev_action, reward, time_idx, terminal, batched_envs)[source]#

Bases: object

Stores a single timestep of rollout data.

Time-aligned to the input format of the policy. Agents learn from sequences of timesteps. Each timestep contains the current observation and time_idx as well as everything that has happened since the last observation was revealed (previous action, reward, terminal).

Parameters:

obs (dict[str, ndarray]) – Dictionary of current observation keys and values.
prev_action (ndarray) – The previous action taken by the agent.
reward (ndarray) – The reward received by the agent after it took prev_action.
time_idx (ndarray) – The integer index of the current timestep.
terminal (ndarray) – The terminal signal of the environment. True if this is the final observation.
batched_envs (int) – The number of environments in the batch. Used to disambiguate batch dimension.

as_input()[source]#

Outputs Timestep data in the input format of the Agent.

Returns:

A tuple containing:

obs: Dictionary of observations with shape (batched_envs, dim_value)
rl2s: Tensor of meta-RL inputs (prev action, reward) with shape (batched_envs, 1 + D_action)
time_idx: Tensor of time indices with shape (batched_envs, 1)

Return type:

tuple

batched_envs: int#

create_reset_version(reset_idxs)[source]#

Manually assign indices of a batched timestep to be first in a new trajectory.

Creates a new Timestep object with rewards, time_idxs, and terminal signals reset as if the environment was reset at the given reset_indices. Used for handling auto-resets in vectorized environments.

Parameters:: reset_idxs (ndarray) – Tensor of indices of parallel environments being reset.
Returns:: New Timestep object with reset values for specified environments.
Return type:: Timestep

obs: dict[str, ndarray]#

prev_action: ndarray#

reward: ndarray#

terminal: ndarray#

time_idx: ndarray#

class Trajectory(timesteps=typing.Optional[typing.Iterable[amago.hindsight.Timestep]])[source]#

Bases: object

A sequence of timesteps.

Stores a rollout and handles disk saves when using the default RLDataset.

Parameters:: timesteps – Iterable of Timestep objects.

add_timestep(timestep)[source]#

Add a timestep to the trajectory.

Parameters:: timestep (Timestep) – Timestep object to add.
Return type:: None

as_input_sequence()[source]#

Returns a sequence of observations, rl2s, and time_idxs.

Uses the trajectory data to gather the standard input sequences for the Agent.

Returns:

A tuple containing:

obs: Dictionary of observations with shape (Batch, Length, dim_value)
rl2s: Tensor of meta-RL inputs with shape (Batch, Length, 1 + D_action)
time: Tensor of time indices with shape (Batch, Length, 1)

Return type:

tuple

freeze()[source]#

Freeze the trajectory and return a FrozenTraj object.

FrozenTraj saves time by precomputing input sequences for the Agent.

Returns:: Frozen trajectory object with precomputed sequences.
Return type:: FrozenTraj

save_to_disk(path, save_as)[source]#

Save the trajectory to disk.

Parameters:

path (str) – Path to save the trajectory to.
save_as (str) – Format to save the trajectory in: - ‘trajectory’: Pickle file storing entire object (now rarely used) - ‘npz’: Standard numpy .npz file format - ‘npz-compressed’: Compressed numpy .npz file format

Return type:

None

property total_return: float#

Calculate the total return of this trajectory.

Returns:: Sum of all rewards in the trajectory.
Return type:: float

split_batched_timestep(t)[source]#

Split a batched timestep into a list of unbatched timesteps.

Used to break up vectorized environments into independent trajectories.

Parameters:: t (Timestep) – Batched timestep to split. Batch dim is t.batched_envs.
Return type:: list[Timestep]
Returns:: List of timesteps with length equal to the number of environments in the batch.

amago.hindsight

Contents

amago.hindsight#