Options set for reinforcement learning agent representations (critics and actors)
Use an rlRepresentationOptions
object to specify an options set for critics (rlValueRepresentation
,
rlQValueRepresentation
)
and actors (rlDeterministicActorRepresentation
, rlStochasticActorRepresentation
).
creates a
default option set to use as a last argument when creating a reinforcement learning actor
or critic. You can modify the object properties using dot notation.repOpts
= rlRepresentationOptions
creates an options set with the specified Properties using one or more
name-value pair arguments.repOpts
= rlRepresentationOptions(Name,Value
)
LearnRate
— Learning rate for the representation0.01
(default) | positive scalarLearning rate for the representation, specified as the comma-separated pair
consisting of 'LearnRate'
and a positive scalar. If the learning rate
is too low, then training takes a long time. If the learning rate is too high, then
training might reach a suboptimal result or diverge.
Example: 'LearnRate',0.025
Optimizer
— Optimizer for representation"adam"
(default) | "sgdm"
| "rmsprop"
Optimizer for training the network of the representation, specified as the
comma-separated pair consisting of 'Optimizer'
and one of the
following strings:
"adam"
— Use the Adam optimizer. You can specify the decay
rates of the gradient and squared gradient moving averages using the
GradientDecayFactor
and
SquaredGradientDecayFactor
fields of the
OptimizerParameters
option.
"sgdm"
— Use the stochastic gradient descent with momentum
(SGDM) optimizer. You can specify the momentum value using the
Momentum
field of the OptimizerParameters
option.
"rmsprop"
— Use the RMSProp optimizer. You can specify the
decay rate of the squared gradient moving average using the
SquaredGradientDecayFactor
fields of the
OptimizerParameters
option.
For more information about these optimizers, see Stochastic Gradient Descent in
the Algorithms section of trainingOptions
in Deep Learning Toolbox™.
Example: 'Optimizer',"sgdm"
OptimizerParameters
— Applicable parameters for optimizerOptimizerParameters
objectApplicable parameters for the optimizer, specified as the comma-separated pair
consisting of 'OptimizerParameters'
and an
OptimizerParameters
object with the following parameters.
Parameter | Description |
---|---|
Momentum | Contribution of previous step, specified as a scalar from 0 to 1. A value of 0 means no contribution from the previous step. A value of 1 means maximal contribution. This parameter applies only when
|
Epsilon | Denominator offset, specified as a positive scalar. The optimizer adds this offset to the denominator in the network parameter updates to avoid division by zero. This parameter applies only when
|
GradientDecayFactor | Decay rate of gradient moving average, specified as a positive scalar from 0 to 1. This parameter applies only when
|
SquaredGradientDecayFactor | Decay rate of squared gradient moving average, specified as a positive scalar from 0 to 1. This parameter applies only when
|
When a particular property of OptimizerParameters
is not
applicable to the optimizer type specified in the Optimizer
option,
that property is set to "Not applicable"
.
To change the default values, create an rlRepresentationOptions
set and use dot notation to access and change the properties of
OptimizerParameters
.
repOpts = rlRepresentationOptions; repOpts.OptimizerParameters.GradientDecayFactor = 0.95;
GradientThreshold
— Threshold value for gradientInf
(default) | positive scalarThreshold value for the representation gradient, specified as the comma-separated
pair consisting of 'GradientThreshold'
and Inf
or
a positive scalar. If the gradient exceeds this value, the gradient is clipped as
specified by the GradientThresholdMethod
option. Clipping the
gradient limits how much the network parameters change in a training iteration.
Example: 'GradientThreshold',1
GradientThresholdMethod
— Gradient threshold method"l2norm"
(default) | "global-l2norm"
| "absolute-value"
Gradient threshold method used to clip gradient values that exceed the gradient
threshold, specified as the comma-separated pair consisting of
'GradientThresholdMethod'
and one of the following strings:
"l2norm"
— If the
L2 norm of the gradient of a learnable
parameter is larger than GradientThreshold
, then scale the
gradient so that the L2 norm equals
GradientThreshold
.
"global-l2norm"
— If the global
L2 norm, L, is larger
than GradientThreshold
, then scale all gradients by a factor of
GradientThreshold/
L. The global
L2 norm considers all learnable
parameters.
"absolute-value"
— If the absolute value of an individual
partial derivative in the gradient of a learnable parameter is larger than
GradientThreshold
, then scale the partial derivative to have
magnitude equal to GradientThreshold
and retain the sign of the
partial derivative.
For more information, see Gradient Clipping in the Algorithms
section of trainingOptions
in Deep Learning Toolbox.
Example: 'GradientThresholdMethod',"absolute-value"
L2RegularizationFactor
— Factor for L2 regularizationFactor for L2 regularization (weight
decay), specified as the comma-separated pair consisting of
'L2RegularizationFactor'
and a nonnegative scalar. For more
information, see L2 Regularization in the Algorithms section of trainingOptions
in Deep Learning Toolbox.
To avoid overfitting when using a representation with many parameters, consider
increasing the L2RegularizationFactor
option.
Example: 'L2RegularizationFactor',0.0005
UseDevice
— Computation device for training"cpu"
(default) | "gpu"
Computation device for training an agent that uses the representation, specified as
the comma-separated pair consisting of 'UseDevice'
and either
"cpu"
or "gpu"
.
The "gpu"
option requires Parallel Computing Toolbox™. To use a GPU for training a network, you must also have a CUDA® enabled NVIDIA® GPU with compute capability 3.0 or
higher.
Note
Training or simulating an agent on a GPU involves device-specific numerical round off errors. These errors can produce different results compared to performing the same operations a CPU.
Example: 'UseDevice',"gpu"
rlValueRepresentation | Value function critic representation for reinforcement learning agents |
rlQValueRepresentation | Q-Value function critic representation for reinforcement learning agents |
rlDeterministicActorRepresentation | Deterministic actor representation for reinforcement learning agents |
rlStochasticActorRepresentation | Stochastic actor representation for reinforcement learning agents |
Create an options set for creating a critic or actor representation for a reinforcement learning agent. Set the learning rate for the representation to 0.05, and set the gradient threshold to 1. You can set the options using Name,Value pairs when you create the options set. Any options that you do not explicitly set have their default values.
repOpts = rlRepresentationOptions('LearnRate',5e-2,... 'GradientThreshold',1)
repOpts = rlRepresentationOptions with properties: LearnRate: 0.0500 GradientThreshold: 1 GradientThresholdMethod: "l2norm" L2RegularizationFactor: 1.0000e-04 UseDevice: "cpu" Optimizer: "adam" OptimizerParameters: [1x1 rl.option.OptimizerParameters]
Alternatively, create a default options set and use dot notation to change some of the values.
repOpts = rlRepresentationOptions; repOpts.LearnRate = 5e-2; repOpts.GradientThreshold = 1
repOpts = rlRepresentationOptions with properties: LearnRate: 0.0500 GradientThreshold: 1 GradientThresholdMethod: "l2norm" L2RegularizationFactor: 1.0000e-04 UseDevice: "cpu" Optimizer: "adam" OptimizerParameters: [1x1 rl.option.OptimizerParameters]
If you want to change the properties of the OptimizerParameters
option, use dot notation to access them.
repOpts.OptimizerParameters.Epsilon = 1e-7; repOpts.OptimizerParameters
ans = OptimizerParameters with properties: Momentum: "Not applicable" Epsilon: 1.0000e-07 GradientDecayFactor: 0.9000 SquaredGradientDecayFactor: 0.9990
You have a modified version of this example. Do you want to open this example with your edits?