Train a reinforcement learning agent within a specified environment
trains a reinforcement learning agent with a specified environment. After each
training episode, trainStats
= train(agent
,env
,trainOpts
)train
updates the parameters of
agent
to maximize the expected long-term reward of the
environment. When training terminates, the agent reflects the state of training at
termination.
Use the training options trainOpts
to specify training
parameters such as the criteria for termination of training, when to save agents,
the maximum number of episodes to train, and the maximum number of steps per
episode.
train
updates the agent as training progresses. To
preserve the original agent parameters for later use, save the agent to a
MAT-file.
By default, calling train
opens the Reinforcement
Learning Episode Manager, which lets you visualize the progress of the training.
The Episode Manager plot shows the reward for each episode, a running average
reward value, and the critic estimate
Q0 (for agents that have critics).
The Episode Manager also displays various episode and training statistics. To
turn off the Reinforcement Learning Episode Manager, set the
Plots
option of trainOpts
to
"none"
.
If you use a predefined environment for which there is a visualization, you
can use plot(env)
to visualize the environment. If you call
plot(env)
before training, then the visualization updates
during training to allow you to visualize the progress of each episode. (For
custom environments, you must implement your own plot
method.)
Training terminates when the conditions specified in
trainOpts
are satisfied. To terminate training in
progress, in the Reinforcement Learning Episode Manager, click Stop
Training. Because train
updates the agent at
each episode, you can resume training by calling
train(agent,env,trainOpts)
again, without losing the
trained parameters learned during the first call to train
.
During training, you can save candidate agents that meet conditions you
specify with trainOpts
. For instance, you can save any
agent whose episode reward exceeds a certain value, even if the overall
condition for terminating training is not yet satisfied.
train
stores saved agents in a MAT-file in the folder
you specify with trainOpts
. Saved agents can be useful, for
instance, to allow you to test candidate agents generated during a long-running
training process. For details about saving criteria and saving location, see
rlTrainingOptions
.
In general, train
performs the following iterative steps:
Initialize agent
.
For each episode:
Reset the environment.
Get the initial observation s0 from the environment.
Compute the initial action a0 = μ(s0).
Set the current action to the initial action (a←a0) and set the current observation to the initial observation (s←s0).
While the episode is not finished or terminated:
Step the environment with action a to obtain the next observation s' and the reward r.
Learn from the experience set (s,a,r,s').
Compute the next action a' = μ(s').
Update the current action with the next action (a←a') and update the current observation with the next observation (s←s').
Break if the episode termination conditions defined in the environment are met.
If the training termination condition defined by
trainOpts
is met, terminate training. Otherwise, begin
the next episode.
The specifics of how train
performs these computations depends on
your configuration of the agent and environment. For instance, resetting the environment
at the start of each episode can include randomizing initial state values, if you
configure your environment to do so.