TensorFlow Modules & Networks¶

Subpackages¶

DNC Memory Modules

Auxiliary Task Modules & Networks¶

`Pixel Control module`¶

class ftw.tf.networks.auxiliary.PixelControl(num_actions: int, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, name: str = 'pixel_control')¶

Bases: sonnet.src.base.Module

Module that produces a pixel control output (i.e. Q-values) from a hidden state input.

This module implements the Pixel Control module from the FTW paper.

Thus, it produces an output of shape [batch_size, 20, 20, num_actions], representing a grid of 20 x 20 cells, each representing a 5 x 5 pixel area, covering a pixel area of altogether 80 x 80 pixels (= (20 cells x 5 pixels) x (20 cells x 5 pixels)).

Consequently, the output produced by this module can only be used for pixel control loss calculation if the observations input to the pixel control loss function is of shape [sequence_length, batch_size, 80, 80, 3] (Pixel control only supports RGB Pixel observations).

Recommended usage is to have 84 x 84 x 3 RGB pixel observations as input to this module and to crop these observations to the central 80 x 80 pixel area for loss calculation, as done by the FTW and UNREAL agents.

__init__(num_actions: int, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, name: str = 'pixel_control')¶

Initializes the PixelControl module.

Args:: num_actions: number of actions in discrete action space activation: activation function to be used (after linear and deconvolutional layer) name: name for the module

`RNN Pixel Control network`¶

class ftw.tf.networks.auxiliary.RNNPixelControlNetwork(embed: sonnet.src.base.Module, core: sonnet.src.recurrent.RNNCore, num_actions: int, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, name: str = 'rnn_pixel_control_network')¶

Bases: sonnet.src.recurrent.RNNCore

Module that produces a Pixel control output (i.e. Q-values) from a pixel observations input.

This module implements the Pixel control module from the FTW paper and wraps it together with a (possibly shared) visual embedding module (= embed) and a (possibly shared) recurrent core (= core). Thus, it produces an output of shape [batch_size, 20, 20, num_actions], representing a grid of 20 x 20 cells, each representing a 5 x 5 pixel area, covering a pixel area of altogether 80 x 80 pixels (= (20 cells x 5 pixels) x (20 cells x 5 pixels)). Consequently, the output produced by this module can only be used for Pixel control loss calculation if the observations input to the Pixel control loss function is of shape [sequence_length, batch_size, 80, 80, 3] (Pixel control only supports RGB Pixel observations).

__init__(embed: sonnet.src.base.Module, core: sonnet.src.recurrent.RNNCore, num_actions: int, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, name: str = 'rnn_pixel_control_network')¶

Initializes the RNNPixelControlNetwork module.

Args:

embed: Visual embedding module (of type sonnet.Module) to transform observations into an embedding,: e.g. FtwTorso.
core: Recurrent core (of type sonnet.RNNCore) to transform embedding into input for PixelControl module,: e.g. RPTH.

num_actions: number of actions in discrete action space. activation: activation function to be used in PixelControl module (after linear and deconvolutional layer) name: name for the module.

initial_state(batch_size: int, **kwargs)¶: Returns the initial state of the recurrent core.

unroll(inputs, state)¶

Unrolls the module over a sequence of pixel observation inputs and produces Pixel Control Q-values.

Args:

inputs: Sequence of (batched) Pixel observations, in the form of a tf.Tensor or an: observation_action_reward.OAR namedtuple.

Returns:

pixel_control_q_vals: Sequence of (batched) Pixel control Q-values with shape [T, B, 20, 20, num_actions].

`Reward Prediction module`¶

class ftw.tf.networks.auxiliary.RewardPrediction(hidden_size: int = 128, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, name='reward_prediction')¶

Bases: sonnet.src.base.Module

Module that produces a reward prediction output from a hidden state tensor.

This module implements the Reward prediction module from the FTW paper and wraps it together with a (possibly shared) embedding module (= embed). Thus, its output is a logits tensor, representing the log-probabilities for the 3 categories to predict (zero reward, negative reward, positive reward).

__init__(hidden_size: int = 128, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, name='reward_prediction')¶

Initializes the RewardPrediction module.

Args:: hidden_size: size of hidden linear layer activation: activation function to be used (between linear and logits layer) name: name for the module.

`Reward Prediction network`¶

class ftw.tf.networks.auxiliary.RewardPredictionNetwork(embed: sonnet.src.base.Module, hidden_size: int = 128, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, name='reward_prediction_network')¶

Bases: sonnet.src.base.Module

Module that produces a reward prediction output from an observations input.

This module implements the Reward prediction module from the FTW paper and wraps it together with a (possibly shared) embedding module (= embed). Thus, its output is a logits tensor, representing the log-probabilities for the 3 categories to predict (zero reward, negative reward, positive reward).

__init__(embed: sonnet.src.base.Module, hidden_size: int = 128, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, name='reward_prediction_network')¶

Initializes the RewardPredictionNetwork module.

Args:: embed: Embedding module (of type sonnet.Module) to transform observations into an embedding, e.g. FtwTorso. hidden_size: size of hidden linear layer activation: activation function to be used in RewardPrediction module (between linear and logits layer) name: name for the module.

Distributional modules¶

`Multivariate Normal Diagonal Distribution Head`¶

class ftw.tf.networks.distributional.MultivariateNormalDiagHead(num_dimensions: int, init_scale: float = 0.3, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, use_tfd_independent: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = <tensorflow.python.keras.initializers.initializers_v2.VarianceScaling object>, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = <tensorflow.python.keras.initializers.initializers_v2.Zeros object>)¶

Bases: sonnet.src.base.Module

Module that produces a multivariate normal distribution using tfd.Independent or tfd.MultivariateNormalDiag.

__init__(num_dimensions: int, init_scale: float = 0.3, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, use_tfd_independent: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = <tensorflow.python.keras.initializers.initializers_v2.VarianceScaling object>, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = <tensorflow.python.keras.initializers.initializers_v2.Zeros object>)¶

Initialization.

Args:

num_dimensions: Number of dimensions of MVN distribution. init_scale: Initial standard deviation. min_scale: Minimum standard deviation. tanh_mean: Whether to transform the mean (via tanh) before passing it to

the distribution.

fixed_scale: Whether to use a fixed variance. use_tfd_independent: Whether to use tfd.Independent or

tfd.MultivariateNormalDiag class

w_init: Initialization for linear layer weights. b_init: Initialization for linear layer biases.

`Multivariate Normal Diagonal Distribution Loc Scale Head`¶

class ftw.tf.networks.distributional.MultivariateNormalDiagLocScaleHead(num_dimensions: int, init_scale: float = 0.3, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None)¶

Bases: sonnet.src.base.Module

Module that produces mean and scale of a multivariate normal distribution.

__init__(num_dimensions: int, init_scale: float = 0.3, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None)¶

Initialization.

Args:

num_dimensions: Number of dimensions of MVN distribution. init_scale: Initial standard deviation. min_scale: Minimum standard deviation. tanh_mean: Whether to transform the mean (via tanh) before passing it to

the distribution.

fixed_scale: Whether to use a fixed variance. w_init: Initialization for linear layer weights. b_init: Initialization for linear layer biases.

Embedding Modules¶

`Observation Action Reward Embedding Module`¶

class ftw.tf.networks.embedding.OAREmbedding(torso: acme.tf.networks.base.Module, num_actions: Union[int, Sequence[int]], internal_rewards: Optional[ftw.tf.internal_reward.ftw_internal_reward.InternalRewards] = None)¶

Bases: sonnet.src.base.Module

Module for embedding (observation, action, reward) inputs together.

This module is based on dm-acme’s OAREmbedding module, but was enhanced to further support

multi-discrete/decomposed action spaces (such as the one from the FTW paper)
internal rewards (as used by the FTW agent).

If a multi-discrete/decomposed action space is used, the action will be embedded as a concatenation of one-hot encodings (one encoding per action group in the multi-discrete/decomposed action space).

If internal rewards are used, the embedding of the reward will be computed

in case of scalar original reward and scalar internal reward: as the product between both
in case of original rewards vector and scalar internal reward: as the product between both
in case original rewards vector and internal rewards vector: as the dot product between both.

Scalar original reward and internal rewards vector is not a supported use-case.

__init__(torso: acme.tf.networks.base.Module, num_actions: Union[int, Sequence[int]], internal_rewards: Optional[ftw.tf.internal_reward.ftw_internal_reward.InternalRewards] = None)¶

Initializes the OAREmbedding module.

Args:

torso: Module transforming observations into an embedding num_actions: Number of actions in action space. Supports discrete action space (if int is supplied),

or multi-discrete/decomposed action space (if sequence of ints is supplied, one for each action group).

internal_rewards: InternalRewards module (as used in the FTW paper). Optional.: If None, no internal rewards calculation will be done.

Raises:

ValueError: If shapes and/or types of constructor arguments do not match expected shapes and types.

For The Win (FTW) Network¶

Network for FTW agent.

class ftw.tf.networks.ftw_network.FtwNetwork(embed: acme.tf.networks.base.Module, core: sonnet.src.recurrent.RNNCore, num_actions: int, head_hidden_size: int = 256, name='simple_ftw_network')¶

Bases: sonnet.src.recurrent.RNNCore

initial_state(batch_size: int, **kwargs)¶

Constructs an initial state for this core.

Args:: batch_size: An int or an integral scalar tensor representing batch size. **kwargs: Optional keyword arguments.
Returns:: Arbitrarily nested initial state for this core.

select_action(batched_observation, state)¶

unroll(inputs: acme.wrappers.observation_action_reward.OAR, states: sonnet.src.recurrent.LSTMState) → Tuple[Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, Tuple[tensorflow.python.framework.ops.Tensor, Sequence[Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]]]], sonnet.src.recurrent.LSTMState]¶: Efficient unroll that applies embeddings, MLP, & convnet in one pass.

Policy Value Head Module¶

`Policy Value Head Module`¶

class ftw.tf.networks.policy_value.PolicyValueHead(num_actions: int, hidden_size: int = 256, activation=<function relu>)¶

Bases: sonnet.src.base.Module

A network with linear layers, for policy and value respectively.

__init__(num_actions: int, hidden_size: int = 256, activation=<function relu>)¶

Initializes the PolicyValueHead module.

Args:: num_actions: Number of actions in discrete action space. hidden_size: Size of hidden layers (between input and output layers). activation: Activation function to be used by this module (between hidden and output layers).
Raises:: ValueError: If shapes and/or types of constructor arguments do not match expected shapes and types.

Recurrent Core Modules¶

`DNC Wrapper Module`¶

class ftw.tf.networks.recurrence.DNCWrapper(lstm: sonnet.src.recurrent.LSTM, memory: ftw.tf.networks.dnc.access.MemoryAccess, output_layer: Union[sonnet.src.base.Module, Callable[[tensorflow.python.framework.ops.Tensor], Union[tensorflow.python.framework.ops.Tensor, tensorflow_probability.python.distributions.distribution.Distribution]], None] = None, clip_value=None, name: str = 'dnc_wrapper')¶

Bases: sonnet.src.recurrent.RNNCore

DNC Memory Wrapper module wrapping an LSTM controller, a DNC MemoryAccess module and an output layer together.

This module implements the DNC memory introduced by Deepmind by connecting an LSTM controller, a DNC MemoryAccess module and an output layer, which are all provided to the module via constructor arguments. In contrast to the original TensorFlow version 1.x implementation of the DNC module at https://github.com/deepmind/dnc, this module lets the user supply controller, memory and output modules as constructor arguments, instead of accepting parameters for the creation of these in the constructor arguments and then building the modules during construction (as is done in the original implementation).

__init__(lstm: sonnet.src.recurrent.LSTM, memory: ftw.tf.networks.dnc.access.MemoryAccess, output_layer: Union[sonnet.src.base.Module, Callable[[tensorflow.python.framework.ops.Tensor], Union[tensorflow.python.framework.ops.Tensor, tensorflow_probability.python.distributions.distribution.Distribution]], None] = None, clip_value=None, name: str = 'dnc_wrapper')¶

Initializes the DNCWrapper module

The clip_value Args info was taken from the original TensorFlow version 1.x implementation of the DNC module at https://github.com/deepmind/dnc.

Args:: lstm: LSTM module of type sonnet.LSTM. memory: DNC MemoryAccess module. output_layer: Output layer that outputs either a tf.Tensor or a tfp.Distribution. clip_value: clips controller and core output values to between [-clip_value, clip_value]` if specified. name: Name for the module.

initial_state(batch_size: int, **unused_kwargs)¶

Returns the initial DNCState namedtuple, containing all elements of the recurrent state.

Elements of the DNCState recurrent state are:

controller_state: state of the controller module (LSTM)
access_state: state of the DNC MemoryAccess module
access_output: last output of the DNC MemoryAccess module.

Returns:

A DNCState namedtuple.

`Variational Unit Module`¶

class ftw.tf.networks.recurrence.VariationalUnit(hidden_size: int, num_dimensions: int, shared_memory: Optional[ftw.tf.networks.dnc.access.MemoryAccess] = None, dnc_clip_value=None, use_dnc_linear_projection: bool = True, init_scale: float = 0.1, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, name='variational_unit')¶

Bases: sonnet.src.recurrent.RNNCore

Variational Unit module as introduced by the FTW paper.

Can be used with a shared DNC MemoryAccess module, if supplied via constructor arguments.

See also the FTW paper for more information, especially Figure S11 of the supplementary material.

__init__(hidden_size: int, num_dimensions: int, shared_memory: Optional[ftw.tf.networks.dnc.access.MemoryAccess] = None, dnc_clip_value=None, use_dnc_linear_projection: bool = True, init_scale: float = 0.1, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, name='variational_unit')¶

Initialization.

Args:

hidden_size: Hidden size of LSTM. num_dimensions: Number of dimensions of MVN distribution. shared_memory: (Possibly shared) DNC MemoryAccess module. Optional. If None, no memory is used. dnc_clip_value: Only used when shared_memory is not None. Clip value used by DNC module for clipping the

(LSTM) controller output and state, as well as the linear output.

use_dnc_linear_projection: Only used when shared_memory is not None. Whether the DNC module outputs the: concatenated LSTM and memory outputs or a linear projection thereof (with the same hidden size as the LSTM). In the original DNC, this linear projection is used. Defaults to True.

init_scale: Initial standard deviation. min_scale: Minimum standard deviation. tanh_mean: Whether to transform the mean (via tanh) before passing it to the distribution. fixed_scale: Whether to use a fixed variance. w_init: Initialization for linear layer weights. b_init: Initialization for linear layer biases. name: Name for the module.

initial_state(batch_size: int, **unused_kwargs)¶

Returns the initial recurrent state.

Recurrent state is an LSTMState namedtuple if LSTM core is used, or DNCState namedtuple if DNC core is used.

Returns:: Initial state namedtuple (LSTMState or DNCState, depending on which core is used) of recurrent core.

`Periodic Variational Unit Module`¶

class ftw.tf.networks.recurrence.PeriodicVariationalUnit(period: Union[int, tensorflow.python.ops.variables.Variable], hidden_size: int, num_dimensions: int, shared_memory: Optional[ftw.tf.networks.dnc.access.MemoryAccess] = None, dnc_clip_value=None, use_dnc_linear_projection: bool = True, init_scale: float = 0.1, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, name='periodic_variational_unit')¶

Bases: ftw.tf.networks.recurrence.VariationalUnit

Periodic Variational Unit module as introduced by the FTW paper.

This module implements a Variational Unit that only updates its hidden state every period steps (i.e. if step % period = 0), as used by the FTW agent.

See also the FTW paper for more information, especially Figure S11 of the supplementary material.

__init__(period: Union[int, tensorflow.python.ops.variables.Variable], hidden_size: int, num_dimensions: int, shared_memory: Optional[ftw.tf.networks.dnc.access.MemoryAccess] = None, dnc_clip_value=None, use_dnc_linear_projection: bool = True, init_scale: float = 0.1, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, name='periodic_variational_unit')¶

Initialization.

Args:

period: Periodically update the recurrent core every period steps.

This module keeps a step counter in its state (which resets to 0 when initial_state() is called). The recurrent core of this module only updates its hidden state when step % period == 0.

For all other arguments, please see the docstrings for

VariationalUnit (in ftw.tf.networks.recurrence) and
MultivariateNormalDiagLocScaleHead (in ftw.tf.networks.distributional).

initial_state(batch_size: int, **unused_kwargs)¶

Returns the initial recurrent state.

Recurrent state is a PeriodicRNNState namedtuple containing recurrent core_state (core_state), previous output (output) and step counter (step), where output is a LocScaleDistributionParameters namedtuple containing the mean and scale (stddev) of a Multivariate Normal Diagonal distribution.

Returns:: Initial recurrent state as a PeriodicRNNState namedtuple.

`RPTH (Recurrent Processing with Temporal Hierarchy) Module`¶

class ftw.tf.networks.recurrence.RPTH(period: Union[int, Sequence[int], tensorflow.python.ops.variables.Variable, Sequence[tensorflow.python.ops.variables.Variable]], hidden_size: int = 256, num_dimensions: int = 256, dnc_clip_value=None, use_dnc_linear_projection: bool = True, init_scale: float = 0.1, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, use_tfd_independent: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, shared_memory: Optional[ftw.tf.networks.dnc.access.MemoryAccess] = None, strict_period_order: bool = True, scale_gradients_fast_to_slow: Union[float, Sequence[float], tensorflow.python.ops.variables.Variable, Sequence[tensorflow.python.ops.variables.Variable]] = 1.0, name: str = 'rpth')¶

Bases: sonnet.src.recurrent.RNNCore

Recurrent processing with temporal hierarchy module, as introduced by the FTW paper.

This module consists of 2 or more Variational Units, where one Variational Unit updates its hidden state every step and is responsible for the posterior distribution, and the other Variational Unit updates its hidden state only if step % period = 0 and is responsible for the prior distribution. Optionally, a DNC MemoryAccess module can be supplied as a constructor argument, which will be shared by all cores, i.e. all cores write to and read from the same memory, and memory weights are shared among all cores.

Warning: Please note that while support for more than 2 cores is implemented, it is not tested yet and is thus highly discouraged. Please proceed with care if you wish to use this feature.

__init__(period: Union[int, Sequence[int], tensorflow.python.ops.variables.Variable, Sequence[tensorflow.python.ops.variables.Variable]], hidden_size: int = 256, num_dimensions: int = 256, dnc_clip_value=None, use_dnc_linear_projection: bool = True, init_scale: float = 0.1, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, use_tfd_independent: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, shared_memory: Optional[ftw.tf.networks.dnc.access.MemoryAccess] = None, strict_period_order: bool = True, scale_gradients_fast_to_slow: Union[float, Sequence[float], tensorflow.python.ops.variables.Variable, Sequence[tensorflow.python.ops.variables.Variable]] = 1.0, name: str = 'rpth')¶

Initializes the RPTH module.

Args:

period: Periodically update the recurrent core(s) every period steps. If period is a

scalar int value, only one slow core will be used. If period is a sequence of scalar int values, multiple slow cores, each with the given period, will be used. Note that when supplying a sequence of scalar int values that is not in descending order, it will be sorted automatically, unless strict_period_order=False.

strict_period_order: See period for further information. Defaults to True, i.e. periods will

automatically be sorted in descending order, if they were not supplied in this order.

For all other arguments, please see the docstrings for

VariationalUnit (in ftw.tf.networks.recurrence) and
MultivariateNormalDiagLocScaleHead (in ftw.tf.networks.distributional).

initial_state(batch_size: int, **unused_kwargs)¶

Constructs an initial state for this core.

Args:: batch_size: An int or an integral scalar tensor representing batch size. **kwargs: Optional keyword arguments.
Returns:: Arbitrarily nested initial state for this core.

`Convenience Wrapper for RPTH Module`¶

class ftw.tf.networks.recurrence.RPTH(period: Union[int, Sequence[int], tensorflow.python.ops.variables.Variable, Sequence[tensorflow.python.ops.variables.Variable]], hidden_size: int = 256, num_dimensions: int = 256, dnc_clip_value=None, use_dnc_linear_projection: bool = True, init_scale: float = 0.1, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, use_tfd_independent: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, shared_memory: Optional[ftw.tf.networks.dnc.access.MemoryAccess] = None, strict_period_order: bool = True, scale_gradients_fast_to_slow: Union[float, Sequence[float], tensorflow.python.ops.variables.Variable, Sequence[tensorflow.python.ops.variables.Variable]] = 1.0, name: str = 'rpth')

Bases: sonnet.src.recurrent.RNNCore

Recurrent processing with temporal hierarchy module, as introduced by the FTW paper.

This module consists of 2 or more Variational Units, where one Variational Unit updates its hidden state every step and is responsible for the posterior distribution, and the other Variational Unit updates its hidden state only if step % period = 0 and is responsible for the prior distribution. Optionally, a DNC MemoryAccess module can be supplied as a constructor argument, which will be shared by all cores, i.e. all cores write to and read from the same memory, and memory weights are shared among all cores.

Warning: Please note that while support for more than 2 cores is implemented, it is not tested yet and is thus highly discouraged. Please proceed with care if you wish to use this feature.

__init__(period: Union[int, Sequence[int], tensorflow.python.ops.variables.Variable, Sequence[tensorflow.python.ops.variables.Variable]], hidden_size: int = 256, num_dimensions: int = 256, dnc_clip_value=None, use_dnc_linear_projection: bool = True, init_scale: float = 0.1, min_scale: float = 1e-06, tanh_mean: bool = False, fixed_scale: bool = False, use_tfd_independent: bool = False, w_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, b_init: Union[sonnet.src.initializers.Initializer, tensorflow.python.keras.initializers.initializers_v2.Initializer, None] = None, shared_memory: Optional[ftw.tf.networks.dnc.access.MemoryAccess] = None, strict_period_order: bool = True, scale_gradients_fast_to_slow: Union[float, Sequence[float], tensorflow.python.ops.variables.Variable, Sequence[tensorflow.python.ops.variables.Variable]] = 1.0, name: str = 'rpth')

Initializes the RPTH module.

Args:

period: Periodically update the recurrent core(s) every period steps. If period is a

scalar int value, only one slow core will be used. If period is a sequence of scalar int values, multiple slow cores, each with the given period, will be used. Note that when supplying a sequence of scalar int values that is not in descending order, it will be sorted automatically, unless strict_period_order=False.

strict_period_order: See period for further information. Defaults to True, i.e. periods will

automatically be sorted in descending order, if they were not supplied in this order.

For all other arguments, please see the docstrings for

VariationalUnit (in ftw.tf.networks.recurrence) and
MultivariateNormalDiagLocScaleHead (in ftw.tf.networks.distributional).

initial_state(batch_size: int, **unused_kwargs)

Constructs an initial state for this core.

Args:: batch_size: An int or an integral scalar tensor representing batch size. **kwargs: Optional keyword arguments.
Returns:: Arbitrarily nested initial state for this core.

`Named Tuples for Recurrent Outputs and States`¶

class ftw.tf.networks.recurrence.LocScaleDistributionParameters(loc, scale)¶

Bases: tuple

loc¶: Alias for field number 0

scale¶: Alias for field number 1

class ftw.tf.networks.recurrence.PeriodicRNNState(core_state, output, step)¶

Bases: tuple

core_state¶: Alias for field number 0

output¶: Alias for field number 1

step¶: Alias for field number 2

class ftw.tf.networks.recurrence.RPTHState(z, core_state, step)¶

Bases: tuple

core_state¶: Alias for field number 1

step¶: Alias for field number 2

z¶: Alias for field number 0

class ftw.tf.networks.recurrence.RPTHOutput(z, distribution_params)¶

Bases: tuple

distribution_params¶: Alias for field number 1

z¶: Alias for field number 0

Vision (Visual Embedding) Modules¶

class ftw.tf.networks.vision.FtwTorso(conv_filters: Sequence[Tuple[int, int, int]] = ((32, 8, 4), (64, 4, 2)), residual_filters: Sequence[Tuple[int, int, int]] = ((64, 3, 1), (64, 3, 1)), hidden_size: int = 256, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, activate_last: bool = False, name: str = 'ftw_torso')¶

Bases: sonnet.src.base.Module

Visual embedding module as used in the FTW paper.

See also the FTW paper for more information, especially Figure S11 of the supplementary material.

__init__(conv_filters: Sequence[Tuple[int, int, int]] = ((32, 8, 4), (64, 4, 2)), residual_filters: Sequence[Tuple[int, int, int]] = ((64, 3, 1), (64, 3, 1)), hidden_size: int = 256, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, activate_last: bool = False, name: str = 'ftw_torso')¶

Initializes the FtwTorso module.

Args:

conv_filters: Sequence of int triples (num_channels, kernel_size, stride) indicating the number of: channels, the kernel size (also called filter size) and stride for each (non-residual) convolutional layer in the sequence.
residual_filters: Sequence of int triples (num_channels, kernel_size, stride) indicating the number of: channels, the kernel size (also called filter size) and stride for each residual convolutional layer in the sequence.

hidden_size: Size of the final output layer. activation: Activation function to be used between layers. activate_last: Whether or not to pass the output of the final layer through the activation function given

by activation.

name: Name for the module.

Raises:

ValueError: If shapes and/or types of constructor arguments do not match expected shapes and types.

TensorFlow Modules & Networks¶

Subpackages¶

Auxiliary Task Modules & Networks¶

Pixel Control module¶

RNN Pixel Control network¶

Reward Prediction module¶

Reward Prediction network¶

Distributional modules¶

Multivariate Normal Diagonal Distribution Head¶

Multivariate Normal Diagonal Distribution Loc Scale Head¶

Embedding Modules¶

Observation Action Reward Embedding Module¶

For The Win (FTW) Network¶

Policy Value Head Module¶

Policy Value Head Module¶

Recurrent Core Modules¶

DNC Wrapper Module¶

Variational Unit Module¶

Periodic Variational Unit Module¶

RPTH (Recurrent Processing with Temporal Hierarchy) Module¶

Convenience Wrapper for RPTH Module¶

Named Tuples for Recurrent Outputs and States¶

Vision (Visual Embedding) Modules¶

`Pixel Control module`¶

`RNN Pixel Control network`¶

`Reward Prediction module`¶

`Reward Prediction network`¶

`Multivariate Normal Diagonal Distribution Head`¶

`Multivariate Normal Diagonal Distribution Loc Scale Head`¶

`Observation Action Reward Embedding Module`¶

`Policy Value Head Module`¶

`DNC Wrapper Module`¶

`Variational Unit Module`¶

`Periodic Variational Unit Module`¶

`RPTH (Recurrent Processing with Temporal Hierarchy) Module`¶

`Convenience Wrapper for RPTH Module`¶

`Named Tuples for Recurrent Outputs and States`¶