getValue

Obtain estimated value function representation

Description

example

value = getValue(valueRep,obs) returns the estimated value function for the state value function representation valueRep given environment observations obs.

example

value = getValue(qValueRep,obs) returns the estimated state-action value functions for the multiple Q-value function representation qValueRep given environment observations obs. In this case, qValueRep has as many outputs as there are possible discrete actions, and getValue returns the state-value function for each action.

example

value = getValue(qValueRep,obs,act) returns the estimated state-action value function for the single-output Q-value function representation qValueRep given environment observations obs and actions act. In this case, getValue returns the state-value function for the given observation and action inputs.

[value,state] = getValue(___) returns the state of the representation. Use this syntax when valueRep or qValueRep is a recurrent neural network.

Examples

collapse all

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a deep neural network for the critic.

criticNetwork = [
    featureInputLayer(4,'Normalization','none','Name','state')
    fullyConnectedLayer(8,'Name','fc')
    reluLayer('Name','relu')
    fullyConnectedLayer(1,'Name','output')];

Create a value function representation object for the critic.

criticOptions = rlRepresentationOptions('LearnRate',1e-2,'GradientThreshold',1);
critic = rlValueRepresentation(criticNetwork,obsInfo,...
    'Observation','state',criticOptions);

Obtain a value function estimate for a random single observation. Use an observation array with the same dimensions as the observation specification.

val = getValue(critic,{rand(4,1)})
val = single
    -0.0899

You can also obtain value function estimates for a batch of observations. For example obtain value functions for a batch of 20 observations.

batchVal = getValue(critic,{rand(4,1,20)});
size(batchVal)
ans = 1×2

     1    20

valBatch contains one value function estimate for each observation in the batch.

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a deep neural network for a multi-output Q-value function representation.

criticNetwork = [
    featureInputLayer(4,'Normalization','none','Name','state')
    fullyConnectedLayer(50, 'Name', 'CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(20,'Name','CriticStateFC2')
    reluLayer('Name','CriticRelu2')
    fullyConnectedLayer(numDiscreteAct,'Name','output')];

Create a representation for your critic using the recurrent neural network.

criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation','state',criticOptions);

Obtain value function estimates for each possible discrete action using random observations.

val = getValue(critic,{rand(4,1)})
val = 2x1 single column vector

    0.0139
   -0.1851

val contains two value function estimates, one for each possible discrete action.

You can also obtain value function estimates for a batch of observations. For example, obtain value function estimates for a batch of 10 observations.

batchVal = getValue(critic,{rand(4,1,10)});
size(batchVal)
ans = 1×2

     2    10

batchVal contains two value function estimates for each observation in the batch.

Create observation specifications for two observation input channels.

obsinfo = [rlNumericSpec([8 3]), rlNumericSpec([4 1])];

Create action specification.

actinfo = rlNumericSpec([2 1]);

Create a deep neural network for the critic. This network has three input channels (two for observations and one for actions).

observationPath1 = [
    imageInputLayer([8 3 1],'Normalization','none','Name','state1')
    fullyConnectedLayer(10, 'Name', 'fc1')
    additionLayer(3,'Name','add')
    reluLayer('Name','relu1')
    fullyConnectedLayer(10,'Name','fc4')
    reluLayer('Name', 'relu2')
    fullyConnectedLayer(1,'Name','fc5')];
observationPath2 = [
    imageInputLayer([4 1 1],'Normalization','none','Name','state2')
    fullyConnectedLayer(10, 'Name','fc2')];
actionPath = [
    imageInputLayer([2 1 1],'Normalization','none','Name','action');
    fullyConnectedLayer(10, 'Name', 'fc3')];
net = layerGraph(observationPath1);
net = addLayers(net,observationPath2);
net = addLayers(net,actionPath);
net = connectLayers(net,'fc2','add/in2');
net = connectLayers(net,'fc3','add/in3');

Create the critic representation with this network.

c = rlQValueRepresentation(net,obsinfo,actinfo,...
    'Observation',{'state1','state2'},'Action',{'action'});

Create random observation set of batch size 64 for each channel.

batchobs_ch1 = rand(8,3,64);
batchobs_ch2 = rand(4,1,64);

Create random action set of batch size 64.

batchact = rand(2,1,64,1);

Obtain the state-action value function estimate for the batch of observations and actions.

qvalue = getValue(c,{batchobs_ch1,batchobs_ch2},{batchact});

Input Arguments

collapse all

Value function representation, specified as an rlValueRepresentation object.

Q-value function representation, specified as an rlQValueRepresentation object.

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are MO-by-LB-by-LS, where:

  • MO corresponds to the dimensions of the associated observation input channel.

  • LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of observations, specify LB > 1. If valueRep or qValueRep has multiple observation input channels, then LB must be the same for all elements of obs.

  • LS specifies the sequence length for a recurrent neural network. If valueRep or qValueRep does not use a recurrent neural network, then LS = 1. If valueRep or qValueRep has multiple observation input channels, then LS must be the same for all elements of obs.

LB and LS must be the same for both act and obs.

Action, specified as a single-element cell array that contains an array of action values.

The dimensions of this array are MA-by-LB-by-LS, where:

  • MA corresponds to the dimensions of the associated action specification.

  • LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of observations, specify LB > 1.

  • LS specifies the sequence length for a recurrent neural network. If valueRep or qValueRep does not use a recurrent neural network, then LS = 1.

LB and LS must be the same for both act and obs.

Output Arguments

collapse all

Estimated value function, returned as array with dimensions N-by-LB-by-LS, where:

  • N is the number of outputs of the critic network.

    • For a state value representation (valueRep), N = 1.

    • For a single-output state-action value representation (qValueRep), N = 1.

    • For a multi-output state-action value representation (qValueRep), N is the number of discrete actions.

  • LB is the batch size.

  • LS is the sequence length for a recurrent neural network.

Representation state for a recurrent neural network, returned as a cell array. If valueRep or qValueRep does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the representation to state using the setState function. For example:

valueRep = setState(valueRep,state);
Introduced in R2020a