Options for training reinforcement learning agents
Use an rlTrainingOptions
object to specify training options for an agent. To train an agent, use train
.
For more information on agents training and simulation, see Train Reinforcement Learning Agents.
returns the
default options for training a reinforcement learning agent. Use training options to
specify parameters about the training session such as the maximum number of episodes to
train, criteria for stopping training, criteria for saving agents, and criteria for using
parallel computing. After configuring the options, use trainOpts
= rlTrainingOptionstrainOpts
as
an input argument for train
.
creates a training options set with the specified Properties using one or more
name-value pair arguments.opt
= rlTrainingOptions(Name,Value
)
MaxEpisodes
— Maximum number of episodes to train the agentMaximum number of episodes to train the agent, specified as the comma-separated pair
consisting of 'MaxEpisodes'
and a positive integer. Regardless of
other criteria for termination, training terminates after this many episodes.
Example: 'MaxEpisodes',1000
MaxStepsPerEpisode
— Maximum number of steps to run per episodeMaximum number of steps to run per episode, specified as the comma-separated pair
consisting of 'MaxStepsPerEpisode'
and a positive integer. In
general, you define episode termination conditions in the environment. This value is the
maximum number of steps to run in the episode if those termination conditions are not
met.
Example: 'MaxStepsPerEpisode',1000
ScoreAveragingWindowLength
— Window length for averagingWindow length for averaging scores, rewards, and numbers of steps, specified as the
comma-separated pair consisting of 'ScoreAveragingWindowLength'
and a
positive integer. For options expressed in terms of averages, this is the number of
episodes included in the average. For instance suppose that
StopTrainingCriteria
is "AverageReward"
, and
StopTrainingValue
is 500. Training terminates when the reward
averaged over the number of episodes specified by this parameter is 500 or
greater.
Example: 'ScoreAveragingWindowLength',10
StopTrainingCriteria
— Training termination condition"AverageSteps"
(default) | "AverageReward"
| "EpisodeCount"
| ...Training termination condition, specified as the comma-separated pair consisting of
'StopTrainingCriteria'
and one of the following strings:
"AverageSteps"
— Stop training when the running average
number of steps per episode equals or exceeds the critical value specified by the
option StopTrainingValue
. The average is computed using the
window 'ScoreAveragingWindowLength'
.
"AverageReward"
— Stop training when the running average
reward equals or exceeds the critical value.
"EpisodeReward"
— Stop training when the reward in the
current episode equals or exceeds the critical value.
"GlobalStepCount"
— Stop training when the total number of
steps in all episodes (the total number of times the agent is invoked) equals or
exceeds the critical value.
"EpisodeCount"
— Stop training when the number of training
episodes equals or exceeds the critical value.
Example: 'StopTrainingCriteria',"AverageReward"
StopTrainingValue
— Critical value of training termination condition 500
(default) | scalarCritical value of training termination condition, specified as the comma-separated
pair consisting of 'StopTrainingValue'
and a scalar. Training
terminates when the termination condition specified by the
StopTrainingCriteria
option equals or exceeds this value. For
instance, if StopTrainingCriteria
is
"AverageReward"
, and StopTrainingValue
is 100,
then training terminates when the average reward over the number of episodes specified
in 'ScoreAveragingWindowLength'
equals or exceeds 100.
Example: 'StopTrainingValue',100
SaveAgentCriteria
— Condition for saving agent during training"none"
(default) | "EpisodeReward"
| "AverageReward"
| "EpisodeCount"
| ...Condition for saving agent during training, specified as the comma-separated pair
consisting of 'SaveAgentCriteria'
and one of the following
strings:
"none"
— Do not save any agents during training.
"EpisodeReward"
— Save agent when the reward in the current
episode equals or exceeds the critical value.
"AverageSteps"
— Save agent when the running average number
of steps per episode equals or exceeds the critical value specified by the option
StopTrainingValue
. The average is computed using the window
'ScoreAveragingWindowLength'
.
"AverageReward"
— Save agent when the running average reward
over all episodes equals or exceeds the critical value.
"GlobalStepCount"
— Save agent when the total number of steps
in all episodes (the total number of times the agent is invoked) equals or exceeds
the critical value.
"EpisodeCount"
— Save agent when the number of training
episodes equals or exceeds the critical value.
Set this option to store candidate agents that perform well according to the
criteria you specify. When you set this option to a value other than
"none"
, the software sets the SaveAgentValue
option to 500. You can change that value to specify the condition for saving the agent.
For instance, suppose you want to store for further testing any agent that yields an
episode reward that equals or exceeds 100. To do so, set
SaveAgentCriteria
to "EpisodeReward"
and set the
SaveAgentValue
option to 100. When an episode reward equals or
exceeds 100, train
saves the corresponding agent in a MAT-file in
the folder specified by the SaveAgentDirectory
option. The MAT-file
is called AgentK.mat
where K
is the number of the
corresponding episode. The agent is stored within that MAT-file as
saved_agent
.
Example: 'SaveAgentCriteria',"EpisodeReward"
SaveAgentValue
— Critical value of condition for saving agent"none"
(default) | 500 | scalarCritical value of condition for saving agent, specified as the comma-separated pair
consisting of 'SaveAgentValue'
and "none"
or a
numeric scalar.
When you specify a condition for saving candidate agents using
SaveAgentCriteria
, the software sets this value to 500. Change the
value to specify the condition for saving the agent. See the
SaveAgentValue
option for more details.
Example: 'SaveAgentValue',100
SaveAgentDirectory
— Folder for saved agents"savedAgents"
(default) | string | character vectorFolder for saved agents, specified as the comma-separated pair consisting of
'SaveAgentDirectory'
and a string or character vector. The folder
name can contain a full or relative path. When an episode occurs that satisfies the
condition specified by the SaveAgentCriteria
and
SaveAgentValue
options, the software saves the agent in a MAT-file
in this folder. If the folder doesn't exist, train
creates it. When
SaveAgentCriteria
is "none"
, this option is
ignored and train
does not create a folder.
Example: 'SaveAgentDirectory', pwd + "\run1\Agents"
UseParallel
— Flag for using parallel trainingfalse
(default) | true
Flag for using parallel training, specified as the comma-separated pair consisting
of 'UseParallel'
and either true
or
false
. Setting this option to true
configures
training to use parallel computing. To specify options for parallel training, use the
ParallelizationOptions
property.
Using parallel computing requires Parallel Computing Toolbox™ software.
For more information about training using parallel computing, see Train Reinforcement Learning Agents.
Example: 'UseParallel',true
ParallelizationOptions
— Options to control parallel trainingParallelTraining
objectParallelization options to control parallel training, specified as the
comma-separated pair consisting of 'ParallelizationOptions'
and a
ParallelTraining
object. For more information about training using
parallel computing, see Train Reinforcement Learning Agents.
The ParallelTraining
object has the following properties, which
you can modify using dot notation after creating the
rlTrainingOptions
object.
Mode
— Parallel computing mode"sync"
(default) | "async"
Parallel computing mode, specified as one of the following:
"sync"
— Use parpool
to run
synchronous training on the available workers. In this case, workers pause
execution until all workers are finished. The host updates the actor and
critic parameters based on the results from all the workers and sends the
updated parameters to all workers.
"async"
— Use parpool
to run
asynchronous training on the available workers. In this case, workers send
their data back to the host as soon as they finish and receive updated
parameters from the host. The workers then continue with their task.
DataToSendFromWorkers
— Type of data that workers send to the host"experiences"
(default) | "gradients"
Type of data that workers send to the host, specified as one of the following strings:
"experiences"
— Send experience data (observation,
action, reward, next observation, is done) to the host. For agents with
gradients, the host computes gradients from the experiences.
"gradients"
— Compute and send gradients to the host.
The host applies gradients to update networks parameters.
AC and PG agents accept only DataToSendFromWorkers =
"gradients"
. DQN and DDPG agents accept only
DataToSendFromWorkers = "experiences"
.
StepsUntilDataIsSent
— When workers send data to host–1
(default) | positive integerWhen workers send data to host and receive updated parameters, specified as
–1
or a positive integer. This number indicates how many
steps to compute during the episode before sending data to the host. When this
option is –1, the worker waits until the end of the episode and then sends all
step data to the host. Otherwise, the worker waits the specified number of steps
before sending data.
AC agents do not accept StepUntilDataIsSent = -1
. For
A3C training, set StepUntilDataIsSent
equal to the
NumStepToLookAhead
AC agent option.
PG agents accept only StepUntilDataIsSent =
-1
.
WorkerRandomSeeds
— Randomizer initialization for workers–1
(default) | –2
| vectorRandomizer initialization for workers, specified as one the following:
–1
— Assign a unique random seed to each worker. The
value of the seed is the worker ID.
–2
— Do not assign a random seed to the workers.
Vector — Manually specify the random seed for each work. The number of elements in the vector must match the number of workers.
TransferBaseWorkspaceVariables
— Send model and workspace variables to parallel workers"on"
(default) | "off"
Send model and workspace variables to parallel workers, specified as
"on"
or "off"
. When the option is
"on"
, the host sends variables used in models and defined in
the base MATLAB® workspace to the workers.
AttachedFiles
— Additional files to attach to the parallel pool[]
(default) | string | string arrayAdditional files to attach to the parallel pool, specified as a string or string array.
SetupFcn
— Function to run before training starts[]
(default) | function handleFunction to run before training starts, specified as a handle to a function having no input arguments. This function is run once per worker before training begins. Write this function to perform any processing that you need prior to training.
CleanupFcn
— Function to run after training ends[]
(default) | function handleFunction to run after training ends, specified as a handle to a function having no input arguments. You can write this function to clean up the workspace or perform other processing after training terminates.
Verbose
— Display training progress on the command linefalse
(0) (default) | true
(1)Display training progress on the command line, specified as the logical values
false
(0) or true
(1). Set to
true
to write information from each training episode to the
MATLAB command line during training.
StopOnError
— Stop training when error occurs"on"
(default) | "off"
Stop training when an error occurs during an episode, specified as
"on"
or "off"
. When this option is
"off"
, errors are captured and returned in the
SimulationInfo
output of train
, and training
continues to the next episode.
Plots
— Display training progress with the Episode Manager"training-progress"
(default) | "none"
Display training progress with the Episode Manager, specified as
"training-progress"
or "none"
. By default,
calling train
opens the Reinforcement Learning Episode Manager,
which graphically and numerically displays information about the training progress, such
as the reward for each episode, average reward, number of episodes, and total number of
steps. (For more information, see train
.) To turn
off this display, set this option to "none"
.
train | Train a reinforcement learning agent within a specified environment |
Create an options set for training a reinforcement learning agent. Set the maximum number of episodes and the maximum steps per episode to 1000. Configure the options to stop training when the average reward equals or exceeds 480, and turn on both the command-line display and the Reinforcement Learning Episode Manager for displaying training results. You can set the options using Name,Value pairs when you create the options set. Any options that you do not explicitly set have their default values.
trainOpts = rlTrainingOptions(... 'MaxEpisodes',1000,... 'MaxStepsPerEpisode',1000,... 'StopTrainingCriteria',"AverageReward",... 'StopTrainingValue',480,... 'Verbose',true,... 'Plots',"training-progress")
trainOpts = rlTrainingOptions with properties: MaxEpisodes: 1000 MaxStepsPerEpisode: 1000 ScoreAveragingWindowLength: 5 StopTrainingCriteria: "AverageReward" StopTrainingValue: 480 SaveAgentCriteria: "none" SaveAgentValue: "none" UseParallel: 0 ParallelizationOptions: [1×1 rl.option.ParallelTraining] SaveAgentDirectory: "savedAgents" StopOnError: "on" Verbose: 1 Plots: "training-progress"
Alternatively, create a default options set and use dot notation to change some of the values.
trainOpts = rlTrainingOptions; trainOpts.MaxEpisodes = 1000; trainOpts.MaxStepsPerEpisode = 1000; trainOpts.StopTrainingCriteria = "AverageReward"; trainOpts.StopTrainingValue = 480; trainOpts.Verbose = true; trainOpts.Plots = "training-progress"; trainOpts
trainOpts = rlTrainingOptions with properties: MaxEpisodes: 1000 MaxStepsPerEpisode: 1000 ScoreAveragingWindowLength: 5 StopTrainingCriteria: "AverageReward" StopTrainingValue: 480 SaveAgentCriteria: "none" SaveAgentValue: "none" UseParallel: 0 ParallelizationOptions: [1×1 rl.option.ParallelTraining] SaveAgentDirectory: "savedAgents" StopOnError: "on" Verbose: 1 Plots: "training-progress"
You can now use trainOpts
as an input argument to the train
command.
You have a modified version of this example. Do you want to open this example with your edits?