rlTrainingOptions

Options for training reinforcement learning agents

expand all in page

Description

Use an rlTrainingOptions object to specify training options for an agent. To train an agent, use train.

For more information on agents training and simulation, see Train Reinforcement Learning Agents.

Creation

Syntax

trainOpts = rlTrainingOptions

opt = rlTrainingOptions(Name,Value)

Description

trainOpts = rlTrainingOptions returns the default options for training a reinforcement learning agent. Use training options to specify parameters about the training session such as the maximum number of episodes to train, criteria for stopping training, criteria for saving agents, and criteria for using parallel computing. After configuring the options, use trainOpts as an input argument for train.

example

opt = rlTrainingOptions(Name,Value) creates a training options set with the specified Properties using one or more name-value pair arguments.

Properties

expand all

`MaxEpisodes` — Maximum number of episodes to train the agent
500 (default) | positive integer

Maximum number of episodes to train the agent, specified as the comma-separated pair consisting of 'MaxEpisodes' and a positive integer. Regardless of other criteria for termination, training terminates after this many episodes.

Example: 'MaxEpisodes',1000

`MaxStepsPerEpisode` — Maximum number of steps to run per episode
500 (default) | positive integer

Maximum number of steps to run per episode, specified as the comma-separated pair consisting of 'MaxStepsPerEpisode' and a positive integer. In general, you define episode termination conditions in the environment. This value is the maximum number of steps to run in the episode if those termination conditions are not met.

Example: 'MaxStepsPerEpisode',1000

`ScoreAveragingWindowLength` — Window length for averaging
5 (default) | positive integer

Window length for averaging scores, rewards, and numbers of steps, specified as the comma-separated pair consisting of 'ScoreAveragingWindowLength' and a positive integer. For options expressed in terms of averages, this is the number of episodes included in the average. For instance suppose that StopTrainingCriteria is "AverageReward", and StopTrainingValue is 500. Training terminates when the reward averaged over the number of episodes specified by this parameter is 500 or greater.

Example: 'ScoreAveragingWindowLength',10

`StopTrainingCriteria` — Training termination condition
`"AverageSteps"` (default) | `"AverageReward"` | `"EpisodeCount"` | ...

Training termination condition, specified as the comma-separated pair consisting of 'StopTrainingCriteria' and one of the following strings:

"AverageSteps" — Stop training when the running average number of steps per episode equals or exceeds the critical value specified by the option StopTrainingValue. The average is computed using the window 'ScoreAveragingWindowLength'.
"AverageReward" — Stop training when the running average reward equals or exceeds the critical value.
"EpisodeReward" — Stop training when the reward in the current episode equals or exceeds the critical value.
"GlobalStepCount" — Stop training when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value.
"EpisodeCount" — Stop training when the number of training episodes equals or exceeds the critical value.

Example: 'StopTrainingCriteria',"AverageReward"

`StopTrainingValue` — Critical value of training termination condition
`500` (default) | scalar

Critical value of training termination condition, specified as the comma-separated pair consisting of 'StopTrainingValue' and a scalar. Training terminates when the termination condition specified by the StopTrainingCriteria option equals or exceeds this value. For instance, if StopTrainingCriteria is "AverageReward", and StopTrainingValue is 100, then training terminates when the average reward over the number of episodes specified in 'ScoreAveragingWindowLength' equals or exceeds 100.

Example: 'StopTrainingValue',100

`SaveAgentCriteria` — Condition for saving agent during training
`"none"` (default) | `"EpisodeReward"` | `"AverageReward"` | `"EpisodeCount"` | ...

Condition for saving agent during training, specified as the comma-separated pair consisting of 'SaveAgentCriteria' and one of the following strings:

"none" — Do not save any agents during training.
"EpisodeReward" — Save agent when the reward in the current episode equals or exceeds the critical value.
"AverageSteps" — Save agent when the running average number of steps per episode equals or exceeds the critical value specified by the option StopTrainingValue. The average is computed using the window 'ScoreAveragingWindowLength'.
"AverageReward" — Save agent when the running average reward over all episodes equals or exceeds the critical value.
"GlobalStepCount" — Save agent when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value.
"EpisodeCount" — Save agent when the number of training episodes equals or exceeds the critical value.

Set this option to store candidate agents that perform well according to the criteria you specify. When you set this option to a value other than "none", the software sets the SaveAgentValue option to 500. You can change that value to specify the condition for saving the agent.

For instance, suppose you want to store for further testing any agent that yields an episode reward that equals or exceeds 100. To do so, set SaveAgentCriteria to "EpisodeReward" and set the SaveAgentValue option to 100. When an episode reward equals or exceeds 100, train saves the corresponding agent in a MAT-file in the folder specified by the SaveAgentDirectory option. The MAT-file is called AgentK.mat where K is the number of the corresponding episode. The agent is stored within that MAT-file as saved_agent.

Example: 'SaveAgentCriteria',"EpisodeReward"

`SaveAgentValue` — Critical value of condition for saving agent
`"none"` (default) | 500 | scalar

Critical value of condition for saving agent, specified as the comma-separated pair consisting of 'SaveAgentValue' and "none" or a numeric scalar.

When you specify a condition for saving candidate agents using SaveAgentCriteria, the software sets this value to 500. Change the value to specify the condition for saving the agent. See the SaveAgentValue option for more details.

Example: 'SaveAgentValue',100

`SaveAgentDirectory` — Folder for saved agents
`"savedAgents"` (default) | string | character vector

Folder for saved agents, specified as the comma-separated pair consisting of 'SaveAgentDirectory' and a string or character vector. The folder name can contain a full or relative path. When an episode occurs that satisfies the condition specified by the SaveAgentCriteria and SaveAgentValue options, the software saves the agent in a MAT-file in this folder. If the folder doesn't exist, train creates it. When SaveAgentCriteria is "none", this option is ignored and train does not create a folder.

Example: 'SaveAgentDirectory', pwd + "\run1\Agents"

`UseParallel` — Flag for using parallel training
`false` (default) | `true`

Flag for using parallel training, specified as the comma-separated pair consisting of 'UseParallel' and either true or false. Setting this option to true configures training to use parallel computing. To specify options for parallel training, use the ParallelizationOptions property.

Using parallel computing requires Parallel Computing Toolbox™ software.

For more information about training using parallel computing, see Train Reinforcement Learning Agents.

Example: 'UseParallel',true

`ParallelizationOptions` — Options to control parallel training
`ParallelTraining` object

Parallelization options to control parallel training, specified as the comma-separated pair consisting of 'ParallelizationOptions' and a ParallelTraining object. For more information about training using parallel computing, see Train Reinforcement Learning Agents.

The ParallelTraining object has the following properties, which you can modify using dot notation after creating the rlTrainingOptions object.

`Mode` — Parallel computing mode
`"sync"` (default) | `"async"`

Parallel computing mode, specified as one of the following:

"sync" — Use parpool to run synchronous training on the available workers. In this case, workers pause execution until all workers are finished. The host updates the actor and critic parameters based on the results from all the workers and sends the updated parameters to all workers.
"async" — Use parpool to run asynchronous training on the available workers. In this case, workers send their data back to the host as soon as they finish and receive updated parameters from the host. The workers then continue with their task.

`DataToSendFromWorkers` — Type of data that workers send to the host
`"experiences"` (default) | `"gradients"`

Type of data that workers send to the host, specified as one of the following strings:

"experiences" — Send experience data (observation, action, reward, next observation, is done) to the host. For agents with gradients, the host computes gradients from the experiences.
"gradients" — Compute and send gradients to the host. The host applies gradients to update networks parameters.

Note

AC and PG agents accept only DataToSendFromWorkers = "gradients". DQN and DDPG agents accept only DataToSendFromWorkers = "experiences".

`StepsUntilDataIsSent` — When workers send data to host
`–1` (default) | positive integer

When workers send data to host and receive updated parameters, specified as –1 or a positive integer. This number indicates how many steps to compute during the episode before sending data to the host. When this option is –1, the worker waits until the end of the episode and then sends all step data to the host. Otherwise, the worker waits the specified number of steps before sending data.

Note

AC agents do not accept StepUntilDataIsSent = -1. For A3C training, set StepUntilDataIsSent equal to the NumStepToLookAhead AC agent option.
PG agents accept only StepUntilDataIsSent = -1.

`WorkerRandomSeeds` — Randomizer initialization for workers
`–1` (default) | `–2` | vector

Randomizer initialization for workers, specified as one the following:

–1 — Assign a unique random seed to each worker. The value of the seed is the worker ID.
–2 — Do not assign a random seed to the workers.
Vector — Manually specify the random seed for each work. The number of elements in the vector must match the number of workers.

`TransferBaseWorkspaceVariables` — Send model and workspace variables to parallel workers
`"on"` (default) | `"off"`

Send model and workspace variables to parallel workers, specified as "on" or "off". When the option is "on", the host sends variables used in models and defined in the base MATLAB^® workspace to the workers.

`AttachedFiles` — Additional files to attach to the parallel pool
`[]` (default) | string | string array

Additional files to attach to the parallel pool, specified as a string or string array.

`SetupFcn` — Function to run before training starts
`[]` (default) | function handle

Function to run before training starts, specified as a handle to a function having no input arguments. This function is run once per worker before training begins. Write this function to perform any processing that you need prior to training.

`CleanupFcn` — Function to run after training ends
`[]` (default) | function handle

Function to run after training ends, specified as a handle to a function having no input arguments. You can write this function to clean up the workspace or perform other processing after training terminates.

`Verbose` — Display training progress on the command line
`false` (0) (default) | `true` (1)

Display training progress on the command line, specified as the logical values false (0) or true (1). Set to true to write information from each training episode to the MATLAB command line during training.

`StopOnError` — Stop training when error occurs
`"on"` (default) | `"off"`

Stop training when an error occurs during an episode, specified as "on" or "off". When this option is "off", errors are captured and returned in the SimulationInfo output of train, and training continues to the next episode.

`Plots` — Display training progress with the Episode Manager
`"training-progress"` (default) | `"none"`

Display training progress with the Episode Manager, specified as "training-progress" or "none". By default, calling train opens the Reinforcement Learning Episode Manager, which graphically and numerically displays information about the training progress, such as the reward for each episode, average reward, number of episodes, and total number of steps. (For more information, see train.) To turn off this display, set this option to "none".

Object Functions

train Train a reinforcement learning agent within a specified environment

Examples

collapse all

Configure Options for Training

Open Live Script

Create an options set for training a reinforcement learning agent. Set the maximum number of episodes and the maximum steps per episode to 1000. Configure the options to stop training when the average reward equals or exceeds 480, and turn on both the command-line display and the Reinforcement Learning Episode Manager for displaying training results. You can set the options using Name,Value pairs when you create the options set. Any options that you do not explicitly set have their default values.

trainOpts = rlTrainingOptions(...
    'MaxEpisodes',1000,...
    'MaxStepsPerEpisode',1000,...
    'StopTrainingCriteria',"AverageReward",...
    'StopTrainingValue',480,...
    'Verbose',true,...
    'Plots',"training-progress")

trainOpts = 
  rlTrainingOptions with properties:

                   MaxEpisodes: 1000
            MaxStepsPerEpisode: 1000
    ScoreAveragingWindowLength: 5
          StopTrainingCriteria: "AverageReward"
             StopTrainingValue: 480
             SaveAgentCriteria: "none"
                SaveAgentValue: "none"
                   UseParallel: 0
        ParallelizationOptions: [1×1 rl.option.ParallelTraining]
            SaveAgentDirectory: "savedAgents"
                   StopOnError: "on"
                       Verbose: 1
                         Plots: "training-progress"

Alternatively, create a default options set and use dot notation to change some of the values.

trainOpts = rlTrainingOptions;
trainOpts.MaxEpisodes = 1000;
trainOpts.MaxStepsPerEpisode = 1000;
trainOpts.StopTrainingCriteria = "AverageReward";
trainOpts.StopTrainingValue = 480;
trainOpts.Verbose = true;
trainOpts.Plots = "training-progress";

trainOpts

trainOpts = 
  rlTrainingOptions with properties:

                   MaxEpisodes: 1000
            MaxStepsPerEpisode: 1000
    ScoreAveragingWindowLength: 5
          StopTrainingCriteria: "AverageReward"
             StopTrainingValue: 480
             SaveAgentCriteria: "none"
                SaveAgentValue: "none"
                   UseParallel: 0
        ParallelizationOptions: [1×1 rl.option.ParallelTraining]
            SaveAgentDirectory: "savedAgents"
                   StopOnError: "on"
                       Verbose: 1
                         Plots: "training-progress"

You can now use trainOpts as an input argument to the train command.

Documentation

rlTrainingOptions

Description

Creation

Syntax

Description

Properties

`MaxEpisodes` — Maximum number of episodes to train the agent
500 (default) | positive integer

`MaxStepsPerEpisode` — Maximum number of steps to run per episode
500 (default) | positive integer

`ScoreAveragingWindowLength` — Window length for averaging
5 (default) | positive integer

`StopTrainingCriteria` — Training termination condition
`"AverageSteps"` (default) | `"AverageReward"` | `"EpisodeCount"` | ...

`StopTrainingValue` — Critical value of training termination condition
`500` (default) | scalar

`SaveAgentCriteria` — Condition for saving agent during training
`"none"` (default) | `"EpisodeReward"` | `"AverageReward"` | `"EpisodeCount"` | ...

`SaveAgentValue` — Critical value of condition for saving agent
`"none"` (default) | 500 | scalar

`SaveAgentDirectory` — Folder for saved agents
`"savedAgents"` (default) | string | character vector

`UseParallel` — Flag for using parallel training
`false` (default) | `true`

`ParallelizationOptions` — Options to control parallel training
`ParallelTraining` object

`Mode` — Parallel computing mode
`"sync"` (default) | `"async"`

`DataToSendFromWorkers` — Type of data that workers send to the host
`"experiences"` (default) | `"gradients"`

Note

`StepsUntilDataIsSent` — When workers send data to host
`–1` (default) | positive integer

Note

`WorkerRandomSeeds` — Randomizer initialization for workers
`–1` (default) | `–2` | vector

`TransferBaseWorkspaceVariables` — Send model and workspace variables to parallel workers
`"on"` (default) | `"off"`

`AttachedFiles` — Additional files to attach to the parallel pool
`[]` (default) | string | string array

`SetupFcn` — Function to run before training starts
`[]` (default) | function handle

`CleanupFcn` — Function to run after training ends
`[]` (default) | function handle

`Verbose` — Display training progress on the command line
`false` (0) (default) | `true` (1)

`StopOnError` — Stop training when error occurs
`"on"` (default) | `"off"`

`Plots` — Display training progress with the Episode Manager
`"training-progress"` (default) | `"none"`

Object Functions

Examples

Configure Options for Training

See Also

Topics

Introduced in R2019a

Reinforcement Learning Toolbox Documentation

Support

Documentation

rlTrainingOptions

Description

Creation

Syntax

Description

Properties

MaxEpisodes — Maximum number of episodes to train the agent 500 (default) | positive integer

MaxStepsPerEpisode — Maximum number of steps to run per episode 500 (default) | positive integer

ScoreAveragingWindowLength — Window length for averaging 5 (default) | positive integer

StopTrainingCriteria — Training termination condition "AverageSteps" (default) | "AverageReward" | "EpisodeCount" | ...

StopTrainingValue — Critical value of training termination condition 500 (default) | scalar

SaveAgentCriteria — Condition for saving agent during training "none" (default) | "EpisodeReward" | "AverageReward" | "EpisodeCount" | ...

SaveAgentValue — Critical value of condition for saving agent "none" (default) | 500 | scalar

SaveAgentDirectory — Folder for saved agents "savedAgents" (default) | string | character vector

UseParallel — Flag for using parallel training false (default) | true

ParallelizationOptions — Options to control parallel training ParallelTraining object

Mode — Parallel computing mode "sync" (default) | "async"

DataToSendFromWorkers — Type of data that workers send to the host "experiences" (default) | "gradients"

Note

StepsUntilDataIsSent — When workers send data to host –1 (default) | positive integer

Note

WorkerRandomSeeds — Randomizer initialization for workers –1 (default) | –2 | vector

TransferBaseWorkspaceVariables — Send model and workspace variables to parallel workers "on" (default) | "off"

AttachedFiles — Additional files to attach to the parallel pool [] (default) | string | string array

SetupFcn — Function to run before training starts [] (default) | function handle

CleanupFcn — Function to run after training ends [] (default) | function handle

Verbose — Display training progress on the command line false (0) (default) | true (1)

StopOnError — Stop training when error occurs "on" (default) | "off"

Plots — Display training progress with the Episode Manager "training-progress" (default) | "none"

Object Functions

Examples

Configure Options for Training

See Also

Topics

Introduced in R2019a

Reinforcement Learning Toolbox Documentation

Support

`MaxEpisodes` — Maximum number of episodes to train the agent
500 (default) | positive integer

`MaxStepsPerEpisode` — Maximum number of steps to run per episode
500 (default) | positive integer

`ScoreAveragingWindowLength` — Window length for averaging
5 (default) | positive integer

`StopTrainingCriteria` — Training termination condition
`"AverageSteps"` (default) | `"AverageReward"` | `"EpisodeCount"` | ...

`StopTrainingValue` — Critical value of training termination condition
`500` (default) | scalar

`SaveAgentCriteria` — Condition for saving agent during training
`"none"` (default) | `"EpisodeReward"` | `"AverageReward"` | `"EpisodeCount"` | ...

`SaveAgentValue` — Critical value of condition for saving agent
`"none"` (default) | 500 | scalar

`SaveAgentDirectory` — Folder for saved agents
`"savedAgents"` (default) | string | character vector

`UseParallel` — Flag for using parallel training
`false` (default) | `true`

`ParallelizationOptions` — Options to control parallel training
`ParallelTraining` object

`Mode` — Parallel computing mode
`"sync"` (default) | `"async"`

`DataToSendFromWorkers` — Type of data that workers send to the host
`"experiences"` (default) | `"gradients"`

`StepsUntilDataIsSent` — When workers send data to host
`–1` (default) | positive integer

`WorkerRandomSeeds` — Randomizer initialization for workers
`–1` (default) | `–2` | vector

`TransferBaseWorkspaceVariables` — Send model and workspace variables to parallel workers
`"on"` (default) | `"off"`

`AttachedFiles` — Additional files to attach to the parallel pool
`[]` (default) | string | string array

`SetupFcn` — Function to run before training starts
`[]` (default) | function handle

`CleanupFcn` — Function to run after training ends
`[]` (default) | function handle

`Verbose` — Display training progress on the command line
`false` (0) (default) | `true` (1)

`StopOnError` — Stop training when error occurs
`"on"` (default) | `"off"`

`Plots` — Display training progress with the Episode Manager
`"training-progress"` (default) | `"none"`