Proximal policy optimization reinforcement learning agent
The proximal policy optimization (PPO) is a model-free, online, on-policy, policy gradient reinforcement learning method. This algorithm alternates between sampling data through environmental interaction and optimizing a clipped surrogate objective function using stochastic gradient descent. The action space can be either discrete or continuous.
For more information on PPO agents, see Proximal Policy Optimization Agents. For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.
creates a proximal policy optimization (PPO) agent for an environment with the given
observation and action specifications, using default initialization options. The actor
and critic representations in the agent use default deep neural networks built from the
observation specification agent
= rlPPOAgent(observationInfo
,actionInfo
)observationInfo
and the action
specification actionInfo
.
creates a PPO agent for an environment with the given observation and action
specifications. The agent uses default networks configured using options specified in
the agent
= rlPPOAgent(observationInfo
,actionInfo
,initOpts
)initOpts
object. Actor-critic agents do not support recurrent
neural networks. For more information on the initialization options, see rlAgentInitializationOptions
.
creates a PPO agent and sets the AgentOptions
property to the agent
= rlPPOAgent(___,agentOptions
)agentOptions
input argument. Use this syntax after
any of the input arguments in the previous syntaxes.
train | Train reinforcement learning agents within a specified environment |
sim | Simulate trained reinforcement learning agents within specified environment |
getAction | Obtain action from agent or actor representation given environment observations |
getActor | Get actor representation from reinforcement learning agent |
setActor | Set actor representation of reinforcement learning agent |
getCritic | Get critic representation from reinforcement learning agent |
setCritic | Set critic representation of reinforcement learning agent |
generatePolicyFunction | Create function that evaluates trained policy of reinforcement learning agent |
For continuous action spaces, this agent does not enforce the constraints set by the action specification. In this case, you must enforce action space constraints within the environment.
Deep Network Designer | rlAgentInitializationOptions
| rlPPOAgentOptions
| rlStochasticActorRepresentation
| rlValueRepresentation