rlDDPGAgent

Deep deterministic policy gradient reinforcement learning agent

Description

The deep deterministic policy gradient (DDPG) algorithm is an actor-critic, model-free, online, off-policy reinforcement learning method which computes an optimal policy that maximizes the long-term reward.

For more information, see Deep Deterministic Policy Gradient Agents.

For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.

Creation

Syntax

agent = rlDDPGAgent(actor,critic,agentOptions)

Description

example

agent = rlDDPGAgent(actor,critic,agentOptions) creates a DDPG agent with the specified actor and critic networks and sets the AgentOptions property.

Input Arguments

expand all

`actor` — Actor network representation
`rlDeterministicActorRepresentation` object

Actor network representation, specified as an rlDeterministicActorRepresentation. For more information on creating actor representations, see Create Policy and Value Function Representations.

`critic` — Critic network representation
`rlQValueRepesentation` object

Critic network representation, specified as an rlQValueRepresentation object. For more information on creating critic representations, see Create Policy and Value Function Representations.

Properties

expand all

`AgentOptions` — Agent options
`rlDDPGAgentOptions` object

Agent options, specified as an rlDDPGAgentOptions object.

`ExperienceBuffer` — Experience buffer
`ExperienceBuffer` object

Experience buffer, specified as an ExperienceBuffer object. During training the agent stores each of its experiences (S,A,R,S') in a buffer. Here:

S is the current observation of the environment.
A is the action taken by the agent.
R is the reward for taking action A.
S' is the next observation after taking action A.

For more information on how the agent samples experience from the buffer during training, see Deep Deterministic Policy Gradient Agents.

Object Functions

`train`	Train a reinforcement learning agent within a specified environment
`sim`	Simulate a trained reinforcement learning agent within a specified environment
`getActor`	Get actor representation from reinforcement learning agent
`setActor`	Set actor representation of reinforcement learning agent
`getCritic`	Get critic representation from reinforcement learning agent
`setCritic`	Set critic representation of reinforcement learning agent
`generatePolicyFunction`	Create function that evaluates trained policy of reinforcement learning agent

Examples

collapse all

Create a DDPG Agent

Open Live Script

Create a DDPG agent with actor and critic and obtain its observation and action specifications.

% load predefined environment
env = rlPredefinedEnv("DoubleIntegrator-Continuous");

% get observation and specification info
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a critic representation.

% create a network to be used as underlying critic approximator
statePath = imageInputLayer([obsInfo.Dimension(1) 1 1], 'Normalization', 'none', 'Name', 'state');
actionPath = imageInputLayer([numel(actInfo) 1 1], 'Normalization', 'none', 'Name', 'action');
commonPath = [concatenationLayer(1,2,'Name','concat')
             quadraticLayer('Name','quadratic')
             fullyConnectedLayer(1,'Name','StateValue','BiasLearnRateFactor', 0, 'Bias', 0)];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork,'state','concat/in1');
criticNetwork = connectLayers(criticNetwork,'action','concat/in2');

% set some options for the critic
criticOpts = rlRepresentationOptions('LearnRate',5e-3,'GradientThreshold',1);

% create the critic based on the network approximator
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation',{'state'},'Action',{'action'},criticOpts);

Create an actor representation.

% create a network to be used as underlying actor approximator
actorNetwork = [
    imageInputLayer([obsInfo.Dimension(1) 1 1], 'Normalization', 'none', 'Name', 'state')
    fullyConnectedLayer(numel(actInfo), 'Name', 'action', 'BiasLearnRateFactor', 0, 'Bias', 0)];

% set some options for the actor
actorOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);

% create the actor based on the network approximator
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation',{'state'},'Action',{'action'},actorOpts);

Specify agent options, and create a PG agent using the environment, actor, and critic.

agentOpts = rlDDPGAgentOptions(...
    'SampleTime',env.Ts,...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',1e6,...
    'DiscountFactor',0.99,...
    'MiniBatchSize',32);
agent = rlDDPGAgent(actor,critic,agentOpts);

To check your agent, use getAction to return the action from a random observation.

getAction(agent,{rand(2,1)})

ans = single
    -0.4719

You can now test and train the agent against the environment.

Documentation

rlDDPGAgent

Description

Creation

Syntax

Description

Input Arguments

`actor` — Actor network representation
`rlDeterministicActorRepresentation` object

`critic` — Critic network representation
`rlQValueRepesentation` object

Properties

`AgentOptions` — Agent options
`rlDDPGAgentOptions` object

`ExperienceBuffer` — Experience buffer
`ExperienceBuffer` object

Object Functions

Examples

Create a DDPG Agent

See Also

Topics

Introduced in R2019a

Reinforcement Learning Toolbox Documentation

Support

Documentation

rlDDPGAgent

Description

Creation

Syntax

Description

Input Arguments

actor — Actor network representation rlDeterministicActorRepresentation object

critic — Critic network representation rlQValueRepesentation object

Properties

AgentOptions — Agent options rlDDPGAgentOptions object

ExperienceBuffer — Experience buffer ExperienceBuffer object

Object Functions

Examples

Create a DDPG Agent

See Also

Topics

Introduced in R2019a

Reinforcement Learning Toolbox Documentation

Support

`actor` — Actor network representation
`rlDeterministicActorRepresentation` object

`critic` — Critic network representation
`rlQValueRepesentation` object

`AgentOptions` — Agent options
`rlDDPGAgentOptions` object

`ExperienceBuffer` — Experience buffer
`ExperienceBuffer` object