getAction

Obtain action from agent or actor representation given environment observations

Description

Agent

example

agentAction = getAction(agent,obs) returns the action derived from the policy of a reinforcement learning agent given environment observations.

Actor Representation

example

actorAction = getAction(actorRep,obs) returns the action derived from policy representation actorRep given environment observations obs.

[actorAction,nextState] = getAction(actorRep,obs) returns the updated state of the actor representation when the actor uses a recurrent neural network as a function approximator.

Examples

collapse all

Create an environment interface and obtain its observation and action specifications. For this environment load the predefined environment used for the discrete cart-pole system.

env = rlPredefinedEnv("CartPole-Discrete");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a critic representation.

statePath = [
    featureInputLayer(4,'Normalization','none','Name','state')
    fullyConnectedLayer(24, 'Name', 'CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(24,'Name','CriticStateFC2')];
actionPath = [
    featureInputLayer(1,'Normalization','none','Name','action')
    fullyConnectedLayer(24, 'Name', 'CriticActionFC1')];
commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1,'Name','output')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);    
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');

Create a representation for the critic.

criticOpts = rlRepresentationOptions('LearnRate',0.01,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation',{'state'},'Action',{'action'},criticOpts);

Specify agent options, and create a DQN agent using the environment and critic.

agentOpts = rlDQNAgentOptions(...
    'UseDoubleDQN',false, ...    
    'TargetUpdateMethod',"periodic", ...
    'TargetUpdateFrequency',4, ...   
    'ExperienceBufferLength',100000, ...
    'DiscountFactor',0.99, ...
    'MiniBatchSize',256);
agent = rlDQNAgent(critic,agentOpts);

Obtain a discrete action from the agent for a single observation. For this example, use a random observation array.

act = getAction(agent,{rand(4,1)})
act = 10

You can also obtain actions for a batch of observations. For example, obtain actions for a batch of 10 observations.

actBatch = getAction(agent,{rand(4,1,10)});
size(actBatch)
ans = 1×2

     1    10

actBatch contains one action for each observation in the batch, with each action being one of the possible discrete actions.

Create observation and action information. You can also obtain these specifications from an environment.

obsinfo = rlNumericSpec([4 1]);
actinfo = rlNumericSpec([2 1]);
numObs = obsinfo.Dimension(1);
numAct = actinfo.Dimension(1);

Create a recurrent deep neural network for the actor.

net = [featureInputLayer(4,'Normalization','none','Name','state')
            fullyConnectedLayer(10,'Name','fc1')
            reluLayer('Name','relu1')
            fullyConnectedLayer(20,'Name','CriticStateFC2')
            fullyConnectedLayer(numAct,'Name','action')
            tanhLayer('Name','tanh1')];

Create a deterministic actor representation for the network.

actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(net,obsinfo,actinfo,...
    'Observation',{'state'},'Action',{'tanh1'});

Obtain an action from this actor for a random batch of 20 observations.

act = getAction(actor,{rand(4,1,10)})
act = 1x1 cell array
    {2x1x10 single}

act contains the two computed actions for all 10 observations in the batch.

Input Arguments

collapse all

Reinforcement learning agent, specified as one of the following objects:

Actor representation, specified as either an rlDeterministicActorRepresentation or rlStochasticActorRepresentation object.

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are MO-by-LB-by-LS, where:

  • MO corresponds to the dimensions of the associated observation input channel.

  • LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of observations, specify LB > 1. If valueRep or qValueRep has multiple observation input channels, then LB must be the same for all elements of obs.

  • LS specifies the sequence length for a recurrent neural network. If valueRep or qValueRep does not use a recurrent neural network, then LS = 1. If valueRep or qValueRep has multiple observation input channels, then LS must be the same for all elements of obs.

LB and LS must be the same for both act and obs.

Output Arguments

collapse all

Action value from agent, returned as an array with dimensions MA-by-LB-by-LS, where:

  • MA corresponds to the dimensions of the associated action specification.

  • LB is the batch size.

  • LS is the sequence length for recurrent neural networks. If the actor and critic in agent do not use recurrent neural networks, then LS = 1.

Note

When agents such as rlACAgent, rlPGAgent, or rlPPOAgent use an rlStochasticActorRepresentation actor with a continuous action space, the constraints set by the action specification are not enforced by the agent. In these cases, you must enforce action space constraints within the environment.

Action value from actor representation, returned as a single-element cell array that contains an array of dimensions MA-by-LB-by-LS, where:

  • MA corresponds to the dimensions of the action specification.

  • LB is the batch size.

  • LS is the sequence length for a recurrent neural network. If actorRep does not use a recurrent neural network, then LS = 1.

Note

rlStochasticActorRepresentation actors with continuous action spaces do not enforce constraints set by the action specification. In these cases, you must enforce action space constraints within the environment.

Actor representation updated state, returned as a cell array. If actorRep does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the representation to state using the setState function. For example:

valueRep = setState(actorRep,state);
Introduced in R2020a