rlTD3Agent

Twin-delayed deep deterministic policy gradient reinforcement learning agent

Description

The twin-delayed deep deterministic policy gradient (DDPG) algorithm is an actor-critic, model-free, online, off-policy reinforcement learning method which computes an optimal policy that maximizes the long-term reward.

Use rlTD3Agent to create one of the following types of agents.

Twin-delayed deep deterministic policy gradient (TD3) agent with two Q-value functions. This agent prevents overestimation of the value function by learning two Q value functions and using the minimum values for policy updates.
Delayed deep deterministic policy gradient (delayed DDPG) agent with a single Q value function. This agent is a DDPG agent with target policy smoothing and delayed policy and target updates.

For more information, see Twin-Delayed Deep Deterministic Policy Gradient Agents.

For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.

Creation

Syntax

agent = rlTD3Agent(actor,critics,agentOptions)

Description

example

agent = rlTD3Agent(actor,critics,agentOptions) creates an agent with the specified actor and critic representations and sets the AgentOptions property. To create a:

TD3 agent, specify a two-element row vector of critic representations.
Delayed DDPG agent, specify a single critic representation.

Input Arguments

expand all

`actor` — Actor network representation
`rlDeterministicActorRepresentation` object

Actor network representation, specified as an rlDeterministicActorRepresentation object. For more information on creating actor representations, see Create Policy and Value Function Representations.

`critics` — Critic network representations
`rlQValueRepresentation` object | two-element row vector of `rlQValueRepresentation` objects

Critic network representations, specified as one of the following:

rlQValueRepresentation object — Create a delayed DDPG agent with a single Q value function. This agent is a DDPG agent with target policy smoothing and delayed policy and target updates.
Two-element row vector of rlQValueRepresentation objects — Create a TD3 agent with two critic value functions. The two critic networks must be unique rlQValueRepresentation objects with the same observation and action specifications. The representations can either have the different structures or the same structure but with different initial parameters.

For more information on creating critic representations, see Create Policy and Value Function Representations.

Properties

expand all

`AgentOptions` — Agent options
`rlTD3AgentOptions` object

Agent options, specified as an rlTD3AgentOptions object.

`ExperienceBuffer` — Experience buffer
`ExperienceBuffer` object

Experience buffer, specified as an ExperienceBuffer object. During training the agent stores each of its experiences (S,A,R,S') in a buffer. Here:

S is the current observation of the environment.
A is the action taken by the agent.
R is the reward for taking action A.
S' is the next observation after taking action A.

For more information on how the agent samples experience from the buffer during training, see Twin-Delayed Deep Deterministic Policy Gradient Agents.

Object Functions

`train`	Train a reinforcement learning agent within a specified environment
`sim`	Simulate a trained reinforcement learning agent within a specified environment
`getActor`	Get actor representation from reinforcement learning agent
`setActor`	Set actor representation of reinforcement learning agent
`getCritic`	Get critic representation from reinforcement learning agent
`setCritic`	Set critic representation of reinforcement learning agent
`generatePolicyFunction`	Create function that evaluates trained policy of reinforcement learning agent

Examples

collapse all

Create TD3 Agent

Open Live Script

Create environment and obtain observation and action specifications.

env = rlPredefinedEnv("DoubleIntegrator-Continuous");
obsInfo = getObservationInfo(env);
numObs = obsInfo.Dimension(1);
actInfo = getActionInfo(env);
numAct = numel(actInfo);

Create two Q-value critic representations. First, create a critic deep neural network structure.

statePath1 = [
    imageInputLayer([numObs 1 1],'Normalization','none','Name','observation')
    fullyConnectedLayer(400,'Name','CriticStateFC1')
    reluLayer('Name','CriticStateRelu1')
    fullyConnectedLayer(300,'Name','CriticStateFC2')
    ];
actionPath1 = [
    imageInputLayer([numAct 1 1],'Normalization','none','Name','action')
    fullyConnectedLayer(300,'Name','CriticActionFC1')
    ];
commonPath1 = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu1')
    fullyConnectedLayer(1,'Name','CriticOutput')
    ];

criticNet = layerGraph(statePath1);
criticNet = addLayers(criticNet,actionPath1);
criticNet = addLayers(criticNet,commonPath1);
criticNet = connectLayers(criticNet,'CriticStateFC2','add/in1');
criticNet = connectLayers(criticNet,'CriticActionFC1','add/in2');

Critic the critic representations. Use the same network structure ofr both critics. The TD3 agent initializes the two networks using different default parameters.

criticOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3,... 
                                        'GradientThreshold',1,'L2RegularizationFactor',2e-4);
critic1 = rlQValueRepresentation(criticNet,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);
critic2 = rlQValueRepresentation(criticNet,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);

Create an actor deep neural network.

actorNet = [
    imageInputLayer([numObs 1 1],'Normalization','none','Name','observation')
    fullyConnectedLayer(400,'Name','ActorFC1')
    reluLayer('Name','ActorRelu1')
    fullyConnectedLayer(300,'Name','ActorFC2')
    reluLayer('Name','ActorRelu2')
    fullyConnectedLayer(numAct,'Name','ActorFC3')                       
    tanhLayer('Name','ActorTanh1')
    ];

Create a deterministic actor representation.

actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3,...
                                       'GradientThreshold',1,'L2RegularizationFactor',1e-5);
actor  = rlDeterministicActorRepresentation(actorNet,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'ActorTanh1'},actorOptions);

Specify agent options.

agentOptions = rlTD3AgentOptions;
agentOptions.DiscountFactor = 0.99;
agentOptions.TargetSmoothFactor = 5e-3;
agentOptions.TargetPolicySmoothModel.Variance = 0.2;
agentOptions.TargetPolicySmoothModel.LowerLimit = -0.5;
agentOptions.TargetPolicySmoothModel.UpperLimit = 0.5;

Create TD3 agent using actor, critics, and options.

agent = rlTD3Agent(actor,[critic1 critic2],agentOptions);

You can also create an rlTD3Agent object with a single critic. In this case, the object represents a DDPG agent with target policy smoothing and delayed policy and target updates.

delayedDDPGAgent = rlTD3Agent(actor,critic1,agentOptions);

Documentation

rlTD3Agent

Description

Creation

Syntax

Description

Input Arguments

`actor` — Actor network representation
`rlDeterministicActorRepresentation` object

`critics` — Critic network representations
`rlQValueRepresentation` object | two-element row vector of `rlQValueRepresentation` objects

Properties

`AgentOptions` — Agent options
`rlTD3AgentOptions` object

`ExperienceBuffer` — Experience buffer
`ExperienceBuffer` object

Object Functions

Examples

Create TD3 Agent

See Also

Topics

Introduced in R2020a

Reinforcement Learning Toolbox Documentation

Support

Documentation

rlTD3Agent

Description

Creation

Syntax

Description

Input Arguments

actor — Actor network representation rlDeterministicActorRepresentation object

critics — Critic network representations rlQValueRepresentation object | two-element row vector of rlQValueRepresentation objects

Properties

AgentOptions — Agent options rlTD3AgentOptions object

ExperienceBuffer — Experience buffer ExperienceBuffer object

Object Functions

Examples

Create TD3 Agent

See Also

Topics

Introduced in R2020a

Reinforcement Learning Toolbox Documentation

Support

`actor` — Actor network representation
`rlDeterministicActorRepresentation` object

`critics` — Critic network representations
`rlQValueRepresentation` object | two-element row vector of `rlQValueRepresentation` objects

`AgentOptions` — Agent options
`rlTD3AgentOptions` object

`ExperienceBuffer` — Experience buffer
`ExperienceBuffer` object