RL Agent

Reinforcement learning agent

  • Library:
  • Reinforcement Learning Toolbox

  • RL Agent block

Description

Use the RL Agent block to simulate and train a reinforcement learning agent in Simulink®. You associate the block with an agent stored in the MATLAB® workspace or a data dictionary as an agent object such as an rlACAgent or rlDDPGAgent object. You connect the block so that it receives an observation and a computed reward. For instance, consider the following block diagram of the rlSimplePendulumModel model.

The observation input port of the RL Agent block receives a signal that is derived from the instantaneous angle and angular velocity of the pendulum. The reward port receives a reward calculated from the same two values and the applied action. You configure the observations and reward computations that are appropriate to your system.

The block uses the agent to generate an action based on the observation and reward you provide. Connect the action output port to the appropriate input for your system. For instance, in the rlSimplePendulumModel, the action port is a torque applied to the pendulum system. For more information about this model, see Train DQN Agent to Swing Up and Balance Pendulum.

To train a reinforcement learning agent in Simulink, you generate an environment from the Simulink model. You then create and configure the agent for training against that environment. For more information, see Create Simulink Environments for Reinforcement Learning. When you call train using the environment, train simulates the model and updates the agent associated with the block.

Ports

Input

expand all

This port receives observation signals from the environment. Observation signals represent measurements or other instantaneous system data. If you have multiple observations, you can use a Mux block to combine them into a vector signal. To use a nonvirtual bus signal, use bus2RLSpec.

This port receives the reward signal, which you compute based on the observation data. The reward signal is used during agent training to maximize the expectation of the long-term reward.

Use this signal to specify conditions under which to terminate a training episode. You must configure logic appropriate to your system to determine the conditions for episode termination. One application is to terminate an episode that is clearly going well or going poorly. For instance, you can terminate an episode if the agent reaches its goal or goes irrecoverably far from its goal.

Output

expand all

Action computed by the agent based on the observation and reward inputs. Connect this port to the inputs of your system. To use a nonvirtual bus signal, use bus2RLSpec.

Note

When agents such as rlACAgent, rlPGAgent, or rlPPOAgent use an rlStochasticActorRepresentation actor with a continuous action space, the constraints set by the action specification are not enforced by the agent. In these cases, you must enforce action space constraints within the environment.

Cumulative sum of the reward signal during simulation. Observe or log this signal to track how the cumulative reward evolves over time.

Dependencies

To enable this port, select the Provide cumulative reward signal parameter.

Parameters

expand all

Enter the name of an agent object stored in the MATLAB workspace or a data dictionary, such as an rlACAgent or rlDDPGAgent object. For information about agent objects, see Reinforcement Learning Agents.

Programmatic Use

Block Parameter: Agent
Type: string, character vector
Default: "agentObj"

Enable the cumulative_reward block output by selecting this parameter.

Programmatic Use

Block Parameter: ProvideCumRwd
Type: string, character vector
Values: "off", "on"
Default: "off"
Introduced in R2019a