Obtain action from agent or actor representation given environment observations
returns the action derived from the policy of a reinforcement learning agent given
environment observations.agentAction
= getAction(agent
,obs
)
returns the action derived from policy representation actorAction
= getAction(actorRep
,obs
)actorRep
given
environment observations obs
.
[
returns the updated state of the actor representation when the actor uses a recurrent
neural network as a function approximator.actorAction
,nextState
] = getAction(actorRep
,obs
)
Create an environment interface and obtain its observation and action specifications. For this environment load the predefined environment used for the discrete cart-pole system.
env = rlPredefinedEnv("CartPole-Discrete");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
Create a critic representation.
statePath = [ featureInputLayer(4,'Normalization','none','Name','state') fullyConnectedLayer(24, 'Name', 'CriticStateFC1') reluLayer('Name','CriticRelu1') fullyConnectedLayer(24,'Name','CriticStateFC2')]; actionPath = [ featureInputLayer(1,'Normalization','none','Name','action') fullyConnectedLayer(24, 'Name', 'CriticActionFC1')]; commonPath = [ additionLayer(2,'Name','add') reluLayer('Name','CriticCommonRelu') fullyConnectedLayer(1,'Name','output')]; criticNetwork = layerGraph(statePath); criticNetwork = addLayers(criticNetwork, actionPath); criticNetwork = addLayers(criticNetwork, commonPath); criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1'); criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
Create a representation for the critic.
criticOpts = rlRepresentationOptions('LearnRate',0.01,'GradientThreshold',1); critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,... 'Observation',{'state'},'Action',{'action'},criticOpts);
Specify agent options, and create a DQN agent using the environment and critic.
agentOpts = rlDQNAgentOptions(... 'UseDoubleDQN',false, ... 'TargetUpdateMethod',"periodic", ... 'TargetUpdateFrequency',4, ... 'ExperienceBufferLength',100000, ... 'DiscountFactor',0.99, ... 'MiniBatchSize',256); agent = rlDQNAgent(critic,agentOpts);
Obtain a discrete action from the agent for a single observation. For this example, use a random observation array.
act = getAction(agent,{rand(4,1)})
act = 10
You can also obtain actions for a batch of observations. For example, obtain actions for a batch of 10 observations.
actBatch = getAction(agent,{rand(4,1,10)}); size(actBatch)
ans = 1×2
1 10
actBatch
contains one action for each observation in the batch, with each action being one of the possible discrete actions.
Create observation and action information. You can also obtain these specifications from an environment.
obsinfo = rlNumericSpec([4 1]); actinfo = rlNumericSpec([2 1]); numObs = obsinfo.Dimension(1); numAct = actinfo.Dimension(1);
Create a recurrent deep neural network for the actor.
net = [featureInputLayer(4,'Normalization','none','Name','state') fullyConnectedLayer(10,'Name','fc1') reluLayer('Name','relu1') fullyConnectedLayer(20,'Name','CriticStateFC2') fullyConnectedLayer(numAct,'Name','action') tanhLayer('Name','tanh1')];
Create a deterministic actor representation for the network.
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1); actor = rlDeterministicActorRepresentation(net,obsinfo,actinfo,... 'Observation',{'state'},'Action',{'tanh1'});
Obtain an action from this actor for a random batch of 20 observations.
act = getAction(actor,{rand(4,1,10)})
act = 1x1 cell array
{2x1x10 single}
act
contains the two computed actions for all 10 observations in the batch.
agent
— Reinforcement learning agentrlQAgent
| rlSARSAAgent
| rlDQNAgent
| rlPGAgent
| rlDDPGAgent
| rlTD3Agent
| rlACAgent
| rlPPOAgent
Reinforcement learning agent, specified as one of the following objects:
actorRep
— Actor representationrlDeterministicActorRepresentation
object | rlStochasticActorRepresentation
objectActor representation, specified as either an rlDeterministicActorRepresentation
or rlStochasticActorRepresentation
object.
obs
— Environment observationsEnvironment observations, specified as a cell array with as many elements as there
are observation input channels. Each element of obs
contains an
array of observations for a single observation input channel.
The dimensions of each element in obs
are
MO-by-LB-by-LS,
where:
MO corresponds to the dimensions of the associated observation input channel.
LB is the batch size. To specify a
single observation, set LB = 1. To specify
a batch of observations, specify LB > 1.
If valueRep
or qValueRep
has multiple
observation input channels, then LB must
be the same for all elements of obs
.
LS specifies the sequence length for a
recurrent neural network. If valueRep
or
qValueRep
does not use a recurrent neural network, then
LS = 1. If
valueRep
or qValueRep
has multiple
observation input channels, then LS must
be the same for all elements of obs
.
LB and
LS must be the same for both
act
and obs
.
agentAction
— Action value from agentAction value from agent, returned as an array with dimensions MA-by-LB-by-LS, where:
MA corresponds to the dimensions of the associated action specification.
LB is the batch size.
LS is the sequence length for
recurrent neural networks. If the actor and critic in agent
do
not use recurrent neural networks, then LS
= 1.
Note
When agents such as rlACAgent
,
rlPGAgent
, or
rlPPOAgent
use
an rlStochasticActorRepresentation
actor with a continuous action space, the
constraints set by the action specification are not enforced by the agent. In these
cases, you must enforce action space constraints within the environment.
actorAction
— Action value from actor representationAction value from actor representation, returned as a single-element cell array that contains an array of dimensions MA-by-LB-by-LS, where:
MA corresponds to the dimensions of the action specification.
LB is the batch size.
LS is the sequence length for a
recurrent neural network. If actorRep
does not use a recurrent
neural network, then LS = 1.
Note
rlStochasticActorRepresentation
actors with continuous action spaces do
not enforce constraints set by the action specification. In these cases, you must
enforce action space constraints within the environment.
nextState
— Actor representation updated stateActor representation updated state, returned as a cell array. If
actorRep
does not use a recurrent neural network, then
state
is an empty cell array.
You can set the state of the representation to state
using the
setState
function. For example:
valueRep = setState(actorRep,state);
You have a modified version of this example. Do you want to open this example with your edits?