amago.nets.tstep_encoders#
Map trajectory data to a sequence of timestep embeddings.
Classes
|
A simple CNN-based TstepEncoder. |
|
A simple MLP-based TstepEncoder. |
|
Abstract base class for Timestep Encoders. |
- class CNNTstepEncoder(obs_space, rl2_space, cnn_type=<class 'amago.nets.cnn.NatureishCNN'>, channels_first=False, img_features=256, rl2_features=12, d_output=256, out_norm='layer', activation='leaky_relu', hide_rl2s=False, drqv2_aug=False, aug_pct_of_batch=0.75, obs_key='observation')[source]#
Bases:
TstepEncoder
A simple CNN-based TstepEncoder.
Tip
This class is
@gin.configurable
. Default values of kwargs can be overridden using gin.Useful for pixel-based environments. Currently only supports the case where observations are a single image without additional state arrays.
- Parameters:
obs_space (
Space
) – Environment observation space.rl2_space (
Space
) – A gym space declaring the shape of previous action and reward features. This is created by the AMAGOEnv wrapper.cnn_type (
Type
[CNN
]) – The type ofnets.cnn.CNN
to use. Defaults tonets.cnn.NatureishCNN
(the small DQN CNN).channels_first (
bool
) – Whether the image is in channels-first format. Defaults to False.img_features (
int
) – Linear map the output of the CNN to this many features. Defaults to 256.rl2_features (
int
) – Linear map the previous action and reward to this many features. Defaults to 12.d_output (
int
) – The output dimension of a layer that fuses the img_features and rl2_features. Defaults to 256.out_norm (
str
) – The normalization layer to use. Seenets.ff.Normalization
for options. Defaults to “layer”.activation (
str
) – The activation function to use. Seenets.utils.activation_switch
for options. Defaults to “leaky_relu”.hide_rl2s (
bool
) – Whether to ignore the previous action and reward features (but otherwise keep the same parameter count and layer dimensions).drqv2_aug (
bool
) – Quick-apply the default DrQv2 image augmentation. Applies random crops to `aug_pct_of_batch`% of every batch during training. Currently requires square images. Defaults to False.aug_pct_of_batch (
float
) – The percentage of every batch to apply DrQv2 augmentation to, ifdrqv2_aug
is True. Defaults to 0.75.obs_key (
str
) – The key in the observation space that contains the image. Defaults to “observation”, which is the default created by AMAGOEnv when the original observation space is not a dict.
- property emb_dim#
The output dimension of the TstepEncoder.
This is used to determine the input dimension of the TrajEncoder. :returns: int, the output dimension of the TstepEncoder. If inner_forward outputs shape (Batch, Length, emb_dim), this should return emb_dim.
- inner_forward(obs, rl2s, log_dict=None)[source]#
Override to implement the network forward pass. :type obs:
dict
[str
,Tensor
] :param obs: dict of {key : torch.Tensor w/ shape (Batch, Length) + self.obs_space[key].shape} :type rl2s:Tensor
:param rl2s: previous actions and rewards features, which might be ignored. Organized here for meta-RL problems. :type log_dict:dict
|None
:param log_dict: If provided, we are tracking extra metrics for a logging step, and should add any wandb metrics here.- Return type:
Tensor
- Returns:
torch.Tensor w/ shape (Batch, Length, self.emb_dim)
- class FFTstepEncoder(obs_space, rl2_space, n_layers=2, d_hidden=512, d_output=256, norm='layer', activation='leaky_relu', hide_rl2s=False, normalize_inputs=True, specify_obs_keys=None)[source]#
Bases:
TstepEncoder
A simple MLP-based TstepEncoder.
Tip
This class is
@gin.configurable
. Default values of kwargs can be overridden using gin.Useful when observations are dicts of 1D arrays.
- Parameters:
obs_space (
Space
) – Environment observation space.rl2_space (
Space
) – A gym space declaring the shape of previous action and reward features. This is created by the AMAGOEnv wrapper.n_layers (
int
) – Number of layers in the MLP. Defaults to 2.d_hidden (
int
) – Dimension of the hidden layers. Defaults to 512.d_output (
int
) – Dimension of the output. Defaults to 256.norm (
str
) – Normalization layer to use. Seenets.ff.Normalization
for options. Defaults to “layer”.activation (
str
) – Activation function to use. Seenets.utils.activation_switch
for options. Defaults to “leaky_relu”.hide_rl2s (
bool
) – Whether to ignore the previous action and reward features (but otherwise keep the same parameter count and layer dimensions).normalize_inputs (
bool
) – Whether to normalize the input features. Seenets.utils.InputNorm
.specify_obs_keys (
list
[str
] |None
) – If provided, only use these keys from the observation space. If None, every key in the observation is used. Multi-modal observations are handled by flattening and concatenating values in a consistent order (alphabetical by key). Defaults to None.
- property emb_dim#
The output dimension of the TstepEncoder.
This is used to determine the input dimension of the TrajEncoder. :returns: int, the output dimension of the TstepEncoder. If inner_forward outputs shape (Batch, Length, emb_dim), this should return emb_dim.
- inner_forward(obs, rl2s, log_dict=None)[source]#
Override to implement the network forward pass. :type obs:
dict
[str
,Tensor
] :param obs: dict of {key : torch.Tensor w/ shape (Batch, Length) + self.obs_space[key].shape} :type rl2s:Tensor
:param rl2s: previous actions and rewards features, which might be ignored. Organized here for meta-RL problems. :type log_dict:dict
|None
:param log_dict: If provided, we are tracking extra metrics for a logging step, and should add any wandb metrics here.- Return type:
Tensor
- Returns:
torch.Tensor w/ shape (Batch, Length, self.emb_dim)
- class TstepEncoder(obs_space, rl2_space)[source]#
Bases:
Module
,ABC
Abstract base class for Timestep Encoders.
Timestep (Tstep) Encoders fuse a dict observation and tensor of extra trajectory data (previous actions & rewards) into a single embedding per timestep, creating a sequence of [Batch, Length, TstepEncoder.emb_dim] embeddings that becomes the input for the main sequence model (TrajEncoder).
Note
Should operate on each timestep of the input sequences independently. Sequence modeling should be left to the TrajEncoder. This is not enforced during training but would break at inference, as the TstepEncoder currently has no hidden state.
- Parameters:
obs_space (
Space
) – Environment observation space.rl2_space (
Space
) – A gym space declaring the shape of previous action and reward features. This is created by the AMAGOEnv wrapper.
- abstract property emb_dim: int#
The output dimension of the TstepEncoder.
This is used to determine the input dimension of the TrajEncoder. :returns: int, the output dimension of the TstepEncoder. If inner_forward outputs shape (Batch, Length, emb_dim), this should return emb_dim.
- forward(obs, rl2s, log_dict=None)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
Tensor
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- abstract inner_forward(obs, rl2s, log_dict=None)[source]#
Override to implement the network forward pass. :type obs:
dict
[str
,Tensor
] :param obs: dict of {key : torch.Tensor w/ shape (Batch, Length) + self.obs_space[key].shape} :type rl2s:Tensor
:param rl2s: previous actions and rewards features, which might be ignored. Organized here for meta-RL problems. :type log_dict:dict
|None
:param log_dict: If provided, we are tracking extra metrics for a logging step, and should add any wandb metrics here.- Return type:
Tensor
- Returns:
torch.Tensor w/ shape (Batch, Length, self.emb_dim)