Train reinforcement learning agents within a specified environment
trains one or more reinforcement learning agents within a specified environment,
using default training options. Although trainStats
= train(env
,agents
)agents
is an input
argument, after each training episode, train
updates the
parameters of each agent specified in agents
to maximize
their expected long-term reward from the environment. When training terminates,
agents
reflects the state of each agent at the end of
the final training episode.
performs the same training as the previous syntax.trainStats
= train(agents
,env
)
trains env
= train(___,trainOpts
)agents
within env
, using the
training options object trainOpts
. Use training options to
specify training parameters such as the criteria for terminating training, when
to save agents, the maximum number of episodes to train, and the maximum number
of steps per episode. Use this syntax after any of the input arguments in the
previous syntaxes.
train
updates the agents as training progresses. To
preserve the original agent parameters for later use, save the agents to a
MAT-file.
By default, calling train
opens the Reinforcement
Learning Episode Manager, which lets you visualize the progress of the training.
The Episode Manager plot shows the reward for each episode, a running average
reward value, and the critic estimate
Q0 (for agents that have critics).
The Episode Manager also displays various episode and training statistics. To
turn off the Reinforcement Learning Episode Manager, set the
Plots
option of trainOpts
to
"none"
.
If you use a predefined environment for which there is a visualization, you
can use plot(env)
to visualize the environment. If you call
plot(env)
before training, then the visualization updates
during training to allow you to visualize the progress of each episode. (For
custom environments, you must implement your own plot
method.)
Training terminates when the conditions specified in
trainOpts
are satisfied. To terminate training in
progress, in the Reinforcement Learning Episode Manager, click Stop
Training. Because train
updates the agent at
each episode, you can resume training by calling
train(agent,env,trainOpts)
again, without losing the
trained parameters learned during the first call to train
.
During training, you can save candidate agents that meet conditions you
specify with trainOpts
. For instance, you can save any
agent whose episode reward exceeds a certain value, even if the overall
condition for terminating training is not yet satisfied.
train
stores saved agents in a MAT-file in the folder
you specify with trainOpts
. Saved agents can be useful, for
instance, to allow you to test candidate agents generated during a long-running
training process. For details about saving criteria and saving location, see
rlTrainingOptions
.
In general, train
performs the following iterative steps:
Initialize agent
.
For each episode:
Reset the environment.
Get the initial observation s0 from the environment.
Compute the initial action a0 = μ(s0).
Set the current action to the initial action (a←a0) and set the current observation to the initial observation (s←s0).
While the episode is not finished or terminated:
Step the environment with action a to obtain the next observation s' and the reward r.
Learn from the experience set (s,a,r,s').
Compute the next action a' = μ(s').
Update the current action with the next action (a←a') and update the current observation with the next observation (s←s').
Break if the episode termination conditions defined in the environment are met.
If the training termination condition defined by
trainOpts
is met, terminate training. Otherwise, begin
the next episode.
The specifics of how train
performs these computations depends on
your configuration of the agent and environment. For instance, resetting the environment
at the start of each episode can include randomizing initial state values, if you
configure your environment to do so.