rlPPOAgent

Proximal policy optimization reinforcement learning agent

Description

The proximal policy optimization (PPO) is a model-free, online, on-policy, policy gradient reinforcement learning method. This algorithm alternates between sampling data through environmental interaction and optimizing a clipped surrogate objective function using stochastic gradient descent.

For more information on PPO agents, see Proximal Policy Optimization Agents.

For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.

Creation

Description

example

agent = rlPPOAgent(actor,critic,agentOptions) creates a proximal policy optimization (PPO) agent with the specified actor and critic networks and sets the AgentOptions property.

Input Arguments

expand all

Actor network representation for the policy, specified as an rlStochasticActorRepresentation object. For more information on creating actor representations, see Create Policy and Value Function Representations.

Your actor representation can use a recurrent neural network as its function approximator. In this case, your critic must also use a recurrent neural network. For an example, see Create PPO Agent with Recurrent Neural Networks.

Critic network representation for estimating the discounted long-term reward, specified as an rlValueRepresentation. For more information on creating critic representations, see Create Policy and Value Function Representations.

Your critic representation can use a recurrent neural network as its function approximator. In this case, your actor must also use a recurrent neural network. For an example, see Create PPO Agent with Recurrent Neural Networks.

Properties

expand all

Agent options, specified as an rlPPOAgentOptions object.

Object Functions

trainTrain a reinforcement learning agent within a specified environment
simSimulate a trained reinforcement learning agent within a specified environment
getActorGet actor representation from reinforcement learning agent
setActorSet actor representation of reinforcement learning agent
getCriticGet critic representation from reinforcement learning agent
setCriticSet critic representation of reinforcement learning agent
generatePolicyFunctionCreate function that evaluates trained policy of reinforcement learning agent

Examples

collapse all

Create an environment interface, and obtain its observation and action specifications.

env = rlPredefinedEnv("CartPole-Discrete");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a critic representation.

% create the network to be used as approximator in the critic
criticNetwork = [
    imageInputLayer([4 1 1],'Normalization','none','Name','state')
    fullyConnectedLayer(1,'Name','CriticFC')];

% set some options for the critic
criticOpts = rlRepresentationOptions('LearnRate',8e-3,'GradientThreshold',1);

% create the critic
critic = rlValueRepresentation(criticNetwork,obsInfo,'Observation',{'state'},criticOpts);

Create an actor representation.

% create the network to be used as approximator in the actor
actorNetwork = [
    imageInputLayer([4 1 1],'Normalization','none','Name','state')
    fullyConnectedLayer(2,'Name','action')];

% set some options for the actor
actorOpts = rlRepresentationOptions('LearnRate',8e-3,'GradientThreshold',1);

% create the actor
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation',{'state'},actorOpts);

Specify agent options, and create a PPO agent using the environment, actor, and critic.

agentOpts = rlPPOAgentOptions(...
    'ExperienceHorizon',1024, ...
    'DiscountFactor',0.95);
agent = rlPPOAgent(actor,critic,agentOpts)
agent = 
  rlPPOAgent with properties:

    AgentOptions: [1x1 rl.option.rlPPOAgentOptions]

To check your agent, use getAction to return the action from a random observation.

getAction(agent,{rand(4,1)})
ans = -10

You can now test and train the agent against the environment.

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a recurrent deep neural network for the critic. To create a recurrent neural network, use a sequenceInputLayer as the input layer and include an lstmLayer as one of the other network layers.

criticNetwork = [
    sequenceInputLayer(numObs,'Normalization','none','Name','state')
    fullyConnectedLayer(8, 'Name', 'fc')
    reluLayer('Name','relu')
    lstmLayer(8,'OutputMode','sequence','Name','lstm')
    fullyConnectedLayer(1,'Name','output')];

Create a value function representation object for the critic.

criticOptions = rlRepresentationOptions('LearnRate',1e-2,'GradientThreshold',1);
critic = rlValueRepresentation(criticNetwork,obsInfo,...
    'Observation','state', criticOptions);

Similarly, define a recurrent neural network for the actor.

actorNetwork = [
    sequenceInputLayer(numObs,'Normalization','none','Name','state')
    fullyConnectedLayer(8,'Name','fc')
    reluLayer('Name','relu')
    lstmLayer(8,'OutputMode','sequence','Name','lstm')
    fullyConnectedLayer(numDiscreteAct,'Name','output')
    softmaxLayer('Name','actionProb')];

Create a stochastic actor representation for the network.

actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation','state', actorOptions);

Create a PPO agent using the actor and critic representations.

agentOptions = rlPPOAgentOptions(...
    'AdvantageEstimateMethod', 'finite-horizon', ...
    'ClipFactor', 0.1);
agent = rlPPOAgent(actor,critic,agentOptions);

Introduced in R2019b