rlPPOAgent

Proximal policy optimization reinforcement learning agent

Description

The proximal policy optimization (PPO) is a model-free, online, on-policy, policy gradient reinforcement learning method. This algorithm alternates between sampling data through environmental interaction and optimizing a clipped surrogate objective function using stochastic gradient descent.

For more information on PPO agents, see Proximal Policy Optimization Agents.

For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.

Creation

Syntax

agent = rlPPOAgent(actor,critic,agentOptions)

Description

example

agent = rlPPOAgent(actor,critic,agentOptions) creates a proximal policy optimization (PPO) agent with the specified actor and critic networks and sets the AgentOptions property.

Input Arguments

expand all

`actor` — Actor network representation
`rlStochasticActorRepresentation` object

Actor network representation for the policy, specified as an rlStochasticActorRepresentation object. For more information on creating actor representations, see Create Policy and Value Function Representations.

Your actor representation can use a recurrent neural network as its function approximator. In this case, your critic must also use a recurrent neural network. For an example, see Create PPO Agent with Recurrent Neural Networks.

`critic` — Critic network representation
`rlValueRepresentation` object

Critic network representation for estimating the discounted long-term reward, specified as an rlValueRepresentation. For more information on creating critic representations, see Create Policy and Value Function Representations.

Your critic representation can use a recurrent neural network as its function approximator. In this case, your actor must also use a recurrent neural network. For an example, see Create PPO Agent with Recurrent Neural Networks.

Properties

expand all

`AgentOptions` — Agent options
`rlPPOAgentOptions` object

Agent options, specified as an rlPPOAgentOptions object.

Object Functions

`train`	Train a reinforcement learning agent within a specified environment
`sim`	Simulate a trained reinforcement learning agent within a specified environment
`getActor`	Get actor representation from reinforcement learning agent
`setActor`	Set actor representation of reinforcement learning agent
`getCritic`	Get critic representation from reinforcement learning agent
`setCritic`	Set critic representation of reinforcement learning agent
`generatePolicyFunction`	Create function that evaluates trained policy of reinforcement learning agent

Examples

collapse all

Create Proximal Policy Optimization Agent

Open Live Script

Create an environment interface, and obtain its observation and action specifications.

env = rlPredefinedEnv("CartPole-Discrete");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a critic representation.

% create the network to be used as approximator in the critic
criticNetwork = [
    imageInputLayer([4 1 1],'Normalization','none','Name','state')
    fullyConnectedLayer(1,'Name','CriticFC')];

% set some options for the critic
criticOpts = rlRepresentationOptions('LearnRate',8e-3,'GradientThreshold',1);

% create the critic
critic = rlValueRepresentation(criticNetwork,obsInfo,'Observation',{'state'},criticOpts);

Create an actor representation.

% create the network to be used as approximator in the actor
actorNetwork = [
    imageInputLayer([4 1 1],'Normalization','none','Name','state')
    fullyConnectedLayer(2,'Name','action')];

% set some options for the actor
actorOpts = rlRepresentationOptions('LearnRate',8e-3,'GradientThreshold',1);

% create the actor
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation',{'state'},actorOpts);

Specify agent options, and create a PPO agent using the environment, actor, and critic.

agentOpts = rlPPOAgentOptions(...
    'ExperienceHorizon',1024, ...
    'DiscountFactor',0.95);
agent = rlPPOAgent(actor,critic,agentOpts)

agent = 
  rlPPOAgent with properties:

    AgentOptions: [1x1 rl.option.rlPPOAgentOptions]

To check your agent, use getAction to return the action from a random observation.

getAction(agent,{rand(4,1)})

ans = -10

You can now test and train the agent against the environment.

Create PPO Agent with Recurrent Neural Networks

Open Live Script

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a recurrent deep neural network for the critic. To create a recurrent neural network, use a sequenceInputLayer as the input layer and include an lstmLayer as one of the other network layers.

criticNetwork = [
    sequenceInputLayer(numObs,'Normalization','none','Name','state')
    fullyConnectedLayer(8, 'Name', 'fc')
    reluLayer('Name','relu')
    lstmLayer(8,'OutputMode','sequence','Name','lstm')
    fullyConnectedLayer(1,'Name','output')];

Create a value function representation object for the critic.

criticOptions = rlRepresentationOptions('LearnRate',1e-2,'GradientThreshold',1);
critic = rlValueRepresentation(criticNetwork,obsInfo,...
    'Observation','state', criticOptions);

Similarly, define a recurrent neural network for the actor.

actorNetwork = [
    sequenceInputLayer(numObs,'Normalization','none','Name','state')
    fullyConnectedLayer(8,'Name','fc')
    reluLayer('Name','relu')
    lstmLayer(8,'OutputMode','sequence','Name','lstm')
    fullyConnectedLayer(numDiscreteAct,'Name','output')
    softmaxLayer('Name','actionProb')];

Create a stochastic actor representation for the network.

actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation','state', actorOptions);

Create a PPO agent using the actor and critic representations.

agentOptions = rlPPOAgentOptions(...
    'AdvantageEstimateMethod', 'finite-horizon', ...
    'ClipFactor', 0.1);
agent = rlPPOAgent(actor,critic,agentOptions);

Documentation

rlPPOAgent

Description

Creation

Syntax

Description

Input Arguments

`actor` — Actor network representation
`rlStochasticActorRepresentation` object

`critic` — Critic network representation
`rlValueRepresentation` object

Properties

`AgentOptions` — Agent options
`rlPPOAgentOptions` object

Object Functions

Examples

Create Proximal Policy Optimization Agent

Create PPO Agent with Recurrent Neural Networks

See Also

Topics

Introduced in R2019b

Reinforcement Learning Toolbox Documentation

Support

Documentation

rlPPOAgent

Description

Creation

Syntax

Description

Input Arguments

actor — Actor network representation rlStochasticActorRepresentation object

critic — Critic network representation rlValueRepresentation object

Properties

AgentOptions — Agent options rlPPOAgentOptions object

Object Functions

Examples

Create Proximal Policy Optimization Agent

Create PPO Agent with Recurrent Neural Networks

See Also

Topics

Introduced in R2019b

Reinforcement Learning Toolbox Documentation

Support

`actor` — Actor network representation
`rlStochasticActorRepresentation` object

`critic` — Critic network representation
`rlValueRepresentation` object

`AgentOptions` — Agent options
`rlPPOAgentOptions` object