RL Agent

Reinforcement learning agent

Library:
Reinforcement Learning Toolbox

Description

Use the RL Agent block to simulate and train a reinforcement learning agent in Simulink^®. You associate the block with an agent stored in the MATLAB^® workspace or a data dictionary as an agent object such as an rlACAgent or rlDDPGAgent object. You connect the block so that it receives an observation and a computed reward. For instance, consider the following block diagram of the rlSimplePendulumModel model.

The observation input port of the RL Agent block receives a signal that is derived from the instantaneous angle and angular velocity of the pendulum. The reward port receives a reward calculated from the same two values and the applied action. You configure the observations and reward computations that are appropriate to your system.

The block uses the agent to generate an action based on the observation and reward you provide. Connect the action output port to the appropriate input for your system. For instance, in the rlSimplePendulumModel, the action port is a torque applied to the pendulum system. For more information about this model, see Train DQN Agent to Swing Up and Balance Pendulum.

To train a reinforcement learning agent in Simulink, you generate an environment from the Simulink model. You then create and configure the agent for training against that environment. For more information, see Create Simulink Environments for Reinforcement Learning. When you call train using the environment, train simulates the model and updates the agent associated with the block.

Ports

Input

expand all

`observation` — Environment observations
scalar | vector | nonvirtual bus

This port receives observation signals from the environment. Observation signals represent measurements or other instantaneous system data. If you have multiple observations, you can use a Mux block to combine them into a vector signal. To use a nonvirtual bus signal, use bus2RLSpec.

`reward` — Reward from environment
scalar

This port receives the reward signal, which you compute based on the observation data. The reward signal is used during agent training to maximize the expectation of the long-term reward.

`isdone` — Flag to terminate episode simulation
logical

Use this signal to specify conditions under which to terminate a training episode. You must configure logic appropriate to your system to determine the conditions for episode termination. One application is to terminate an episode that is clearly going well or going poorly. For instance, you can terminate an episode if the agent reaches its goal or goes irrecoverably far from its goal.

Output

expand all

`action` — Agent action
scalar | vector | nonvirtual bus

Action computed by the agent based on the observation and reward inputs. Connect this port to the inputs of your system. To use a nonvirtual bus signal, use bus2RLSpec.

Note

When agents such as rlACAgent, rlPGAgent, or rlPPOAgent use an rlStochasticActorRepresentation actor with a continuous action space, the constraints set by the action specification are not enforced by the agent. In these cases, you must enforce action space constraints within the environment.

`cumulative_reward` — Total reward
scalar | vector

Cumulative sum of the reward signal during simulation. Observe or log this signal to track how the cumulative reward evolves over time.

Dependencies

To enable this port, select the Provide cumulative reward signal parameter.

Parameters

expand all

`Agent object` — Agent to train
`agent` (default) | agent object

Enter the name of an agent object stored in the MATLAB workspace or a data dictionary, such as an rlACAgent or rlDDPGAgent object. For information about agent objects, see Reinforcement Learning Agents.

Programmatic Use

Block Parameter: Agent

Type: string, character vector

Default: "agentObj"

`Provide cumulative reward signal` — Add cumulative reward output port
`off` (default) | `on`

Enable the cumulative_reward block output by selecting this parameter.

Programmatic Use

Block Parameter: ProvideCumRwd

Type: string, character vector

Values: "off", "on"

Default: "off"

Model Examples

Train DQN Agent to Swing Up and Balance Pendulum

Train a Deep Q-network agent to balance a pendulum modeled in Simulink.

Train DDPG Agent to Swing Up and Balance Pendulum

Train a deep deterministic policy gradient agent to balance a pendulum modeled in Simulink.

Documentation

RL Agent

Description

Ports

Input

`observation` — Environment observations
scalar | vector | nonvirtual bus

`reward` — Reward from environment
scalar

`isdone` — Flag to terminate episode simulation
logical

Output

`action` — Agent action
scalar | vector | nonvirtual bus

`cumulative_reward` — Total reward
scalar | vector

Dependencies

Parameters

`Agent object` — Agent to train
`agent` (default) | agent object

Programmatic Use

`Provide cumulative reward signal` — Add cumulative reward output port
`off` (default) | `on`

Programmatic Use

Model Examples

Train DQN Agent to Swing Up and Balance Pendulum

Train DDPG Agent to Swing Up and Balance Pendulum

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

Documentation

RL Agent

Description

Ports

Input

observation — Environment observations scalar | vector | nonvirtual bus

reward — Reward from environment scalar

isdone — Flag to terminate episode simulation logical

Output

action — Agent action scalar | vector | nonvirtual bus

cumulative_reward — Total reward scalar | vector

Dependencies

Parameters

Agent object — Agent to train agent (default) | agent object

Programmatic Use

Provide cumulative reward signal — Add cumulative reward output port off (default) | on

Programmatic Use

Model Examples

Train DQN Agent to Swing Up and Balance Pendulum

Train DDPG Agent to Swing Up and Balance Pendulum

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

`observation` — Environment observations
scalar | vector | nonvirtual bus

`reward` — Reward from environment
scalar

`isdone` — Flag to terminate episode simulation
logical

`action` — Agent action
scalar | vector | nonvirtual bus

`cumulative_reward` — Total reward
scalar | vector

`Agent object` — Agent to train
`agent` (default) | agent object

`Provide cumulative reward signal` — Add cumulative reward output port
`off` (default) | `on`