Options for DQN agent
Use an rlDQNAgentOptions
object to specify options for deep
Q-network (DQN) agents. To create a DQN agent, use rlDQNAgent
.
For more information, see Deep Q-Network Agents.
For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.
creates an options
object for use as an argument when creating a DQN agent using all default settings. You
can modify the object properties using dot notation.opt
= rlDQNAgentOptions
sets option properties using
name-value pairs. For example, opt
= rlDQNAgentOptions(Name,Value
)rlDQNAgentOptions('DiscountFactor',0.95)
creates an option set with a discount factor of 0.95
. You can specify
multiple name-value pairs. Enclose each property name in quotes.
UseDoubleDQN
— Flag for using double DQNFlag for using double DQN for value function target updates, specified as a logical
value. For most application set UseDoubleDQN
to
"on"
. For more information, see Deep Q-Network Agents.
EpsilonGreedyExploration
— Options for epsilon-greedy explorationEpsilonGreedyExploration
objectOptions for epsilon-greedy exploration, specified as an
EpsilonGreedyExploration
object with the following
properties.
Property | Description | Default Value |
---|---|---|
Epsilon | Probability threshold to either randomly select an action or select the
action that maximizes the state-action value function. A larger value of
Epsilon means that the agent randomly explores the action
space at a higher rate. | 1 |
EpsilonMin | Minimum value of Epsilon | 0.01 |
EpsilonDecay | Decay rate | 0.0050 |
At the end of each training time step, if Epsilon
is greater than
EpsilonMin
, then it is updated using the following formula.
Epsilon = Epsilon*(1-EpsilonDecay)
To specify exploration options, use dot notation after creating the
rlDQNAgentOptions
object. For example, set the epsilon value to
0.9
.
opt = rlDQNAgentOptions; opt.EpsilonGreedyExploration.Epsilon = 0.9;
If your agent converges on local optima too quickly, promote agent exploration by
increasing Epsilon
.
SequenceLength
— Maximum batch-training trajectory length when using RNN1
(default) | positive integerMaximum batch-training trajectory length when using a recurrent neural network for
the critic, specified as a positive integer. This value must be greater than
1
when using a recurrent neural network for the critic and
1
otherwise.
TargetSmoothFactor
— Smoothing factor for target critic updates1e-3
(default) | positive scalar less than or equal to 1Smoothing factor for target critic updates, specified as a positive scalar less than or equal to 1. For more information, see Target Update Methods.
TargetUpdateFrequency
— Number of steps between target critic updates1
(default) | positive integerNumber of steps between target critic updates, specified as a positive integer. For more information, see Target Update Methods.
ResetExperienceBufferBeforeTraining
— Flag for clearing the experience buffertrue
(default) | false
Flag for clearing the experience buffer before training, specified as a logical value.
SaveExperienceBufferWithAgent
— Flag for saving the experience bufferfalse
(default) | true
Flag for saving the experience buffer data when saving the agent, specified as a
logical value. This option applies both when saving candidate agents during training and
when saving agents using the save
function.
For some agents, such as those with a large experience buffer and image-based
observations, the memory required for saving their experience buffer is large. In such
cases, to not save the experience buffer data, set
SaveExperienceBufferWithAgent
to false
.
If you plan to further train your saved agent, you can start training with the
previous experience buffer as a starting point. In this case, set
SaveExperienceBufferWithAgent
to true
.
MiniBatchSize
— Size of random experience mini-batch64
(default) | positive integerSize of random experience mini-batch, specified as a positive integer. During each training episode, the agent randomly samples experiences from the experience buffer when computing gradients for updating the critic properties. Large mini-batches reduce the variance when computing gradients but increase the computational effort.
When using a recurrent neural network for the critic,
MiniBatchSize
is the number of experience trajectories in a
batch, where each trajectory has length equal to
SequenceLength
.
NumStepsToLookAhead
— Number of steps ahead1
(default) | positive integerNumber of steps to look ahead during training, specified as a positive integer.
N-step Q learning is not supported when using a recurrent neural network for the
critic. In this case, NumStepsToLookAhead
must be
1
.
ExperienceBufferLength
— Experience buffer size10000
(default) | positive integerExperience buffer size, specified as a positive integer. During training, the agent updates the critic using a mini-batch of experiences randomly sampled from the buffer.
SampleTime
— Sample time of agent1
(default) | positive scalarSample time of agent, specified as a positive scalar.
Within a Simulink environment, the agent gets executed every
SampleTime
seconds of simulation time.
Within a MATLAB environment, the agent gets executed every time the environment
advances. However, SampleTime
is the time interval between
consecutive elements in the output experience returned by sim
or
train
.
DiscountFactor
— Discount factor0.99
(default) | positive scalar less than or equal to 1Discount factor applied to future rewards during training, specified as a positive scalar less than or equal to 1.
rlDQNAgent | Deep Q-network reinforcement learning agent |
This example shows how to create a DQN agent options object.
Create an rlDQNAgentOptions
object that specifies the agent mini-batch size.
opt = rlDQNAgentOptions('MiniBatchSize',48)
opt = rlDQNAgentOptions with properties: UseDoubleDQN: 1 EpsilonGreedyExploration: [1x1 rl.option.EpsilonGreedyExploration] SequenceLength: 1 TargetSmoothFactor: 1.0000e-03 TargetUpdateFrequency: 1 ResetExperienceBufferBeforeTraining: 1 SaveExperienceBufferWithAgent: 0 MiniBatchSize: 48 NumStepsToLookAhead: 1 ExperienceBufferLength: 10000 SampleTime: 1 DiscountFactor: 0.9900
You can modify options using dot notation. For example, set the agent sample time to 0.5
.
opt.SampleTime = 0.5;
Behavior changed in R2020a
Target update method settings for DQN agents have changed. The following changes require updates to your code:
The TargetUpdateMethod
option has been removed. Now, DQN agents
determine the target update method based on the TargetUpdateFrequency
and TargetSmoothFactor
option values.
The default value of TargetUpdateFrequency
has changed from
4
to 1
.
To use one of the following target update methods, set the
TargetUpdateFrequency
and TargetSmoothFactor
properties as indicated.
Update Method | TargetUpdateFrequency | TargetSmoothFactor |
---|---|---|
Smoothing | 1 | Less than 1 |
Periodic | Greater than 1 | 1 |
Periodic smoothing (new method in R2020a) | Greater than 1 | Less than 1 |
The default target update configuration, which is a smoothing update with a
TargetSmoothFactor
value of 0.001
, remains the
same.
This table shows some typical uses of rlDQNAgentOptions
and how to update your code to use the new option configuration.
Not Recommended | Recommended |
---|---|
opt = rlDQNAgentOptions('TargetUpdateMethod',"smoothing"); |
opt = rlDQNAgentOptions; |
opt = rlDQNAgentOptions('TargetUpdateMethod',"periodic"); |
opt = rlDQNAgentOptions; opt.TargetUpdateFrequency = 4; opt.TargetSmoothFactor = 1; |
opt = rlDQNAgentOptions; opt.TargetUpdateMethod = "periodic"; opt.TargetUpdateFrequency = 5; |
opt = rlDQNAgentOptions; opt.TargetUpdateFrequency = 5; opt.TargetSmoothFactor = 1; |
You have a modified version of this example. Do you want to open this example with your edits?