getValue

Obtain estimated value function representation

collapse all in page

Syntax

value = getValue(valueRep,obs)

value = getValue(qValueRep,obs)

value = getValue(qValueRep,obs,act)

[value,state] = getValue(___)

Description

example

value = getValue(valueRep,obs) returns the estimated value function for the state value function representation valueRep given environment observations obs.

example

value = getValue(qValueRep,obs) returns the estimated state-action value functions for the multiple Q-value function representation qValueRep given environment observations obs. In this case, qValueRep has as many outputs as there are possible discrete actions, and getValue returns the state-value function for each action.

example

value = getValue(qValueRep,obs,act) returns the estimated state-action value function for the single-output Q-value function representation qValueRep given environment observations obs and actions act. In this case, getValue returns the state-value function for the given observation and action inputs.

[value,state] = getValue(___) returns the state of the representation. Use this syntax when valueRep or qValueRep is a recurrent neural network.

Examples

collapse all

Obtain State Value Function Estimates

Open Live Script

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a deep neural network for the critic.

criticNetwork = [
    featureInputLayer(4,'Normalization','none','Name','state')
    fullyConnectedLayer(8,'Name','fc')
    reluLayer('Name','relu')
    fullyConnectedLayer(1,'Name','output')];

Create a value function representation object for the critic.

criticOptions = rlRepresentationOptions('LearnRate',1e-2,'GradientThreshold',1);
critic = rlValueRepresentation(criticNetwork,obsInfo,...
    'Observation','state',criticOptions);

Obtain a value function estimate for a random single observation. Use an observation array with the same dimensions as the observation specification.

val = getValue(critic,{rand(4,1)})

val = single
    -0.0899

You can also obtain value function estimates for a batch of observations. For example obtain value functions for a batch of 20 observations.

batchVal = getValue(critic,{rand(4,1,20)});
size(batchVal)

ans = 1×2

     1    20

valBatch contains one value function estimate for each observation in the batch.

Obtain Multi-Output Q-Value Function Estimates

Open Live Script

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a deep neural network for a multi-output Q-value function representation.

criticNetwork = [
    featureInputLayer(4,'Normalization','none','Name','state')
    fullyConnectedLayer(50, 'Name', 'CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(20,'Name','CriticStateFC2')
    reluLayer('Name','CriticRelu2')
    fullyConnectedLayer(numDiscreteAct,'Name','output')];

Create a representation for your critic using the recurrent neural network.

criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation','state',criticOptions);

Obtain value function estimates for each possible discrete action using random observations.

val = getValue(critic,{rand(4,1)})

val = 2x1 single column vector

    0.0139
   -0.1851

val contains two value function estimates, one for each possible discrete action.

You can also obtain value function estimates for a batch of observations. For example, obtain value function estimates for a batch of 10 observations.

batchVal = getValue(critic,{rand(4,1,10)});
size(batchVal)

ans = 1×2

     2    10

batchVal contains two value function estimates for each observation in the batch.

Obtain Single-Output Q-Value Function Estimates

Open Live Script

Create observation specifications for two observation input channels.

obsinfo = [rlNumericSpec([8 3]), rlNumericSpec([4 1])];

Create action specification.

actinfo = rlNumericSpec([2 1]);

Create a deep neural network for the critic. This network has three input channels (two for observations and one for actions).

observationPath1 = [
    imageInputLayer([8 3 1],'Normalization','none','Name','state1')
    fullyConnectedLayer(10, 'Name', 'fc1')
    additionLayer(3,'Name','add')
    reluLayer('Name','relu1')
    fullyConnectedLayer(10,'Name','fc4')
    reluLayer('Name', 'relu2')
    fullyConnectedLayer(1,'Name','fc5')];
observationPath2 = [
    imageInputLayer([4 1 1],'Normalization','none','Name','state2')
    fullyConnectedLayer(10, 'Name','fc2')];
actionPath = [
    imageInputLayer([2 1 1],'Normalization','none','Name','action');
    fullyConnectedLayer(10, 'Name', 'fc3')];
net = layerGraph(observationPath1);
net = addLayers(net,observationPath2);
net = addLayers(net,actionPath);
net = connectLayers(net,'fc2','add/in2');
net = connectLayers(net,'fc3','add/in3');

Create the critic representation with this network.

c = rlQValueRepresentation(net,obsinfo,actinfo,...
    'Observation',{'state1','state2'},'Action',{'action'});

Create random observation set of batch size 64 for each channel.

batchobs_ch1 = rand(8,3,64);
batchobs_ch2 = rand(4,1,64);

Create random action set of batch size 64.

batchact = rand(2,1,64,1);

Obtain the state-action value function estimate for the batch of observations and actions.

qvalue = getValue(c,{batchobs_ch1,batchobs_ch2},{batchact});

Input Arguments

collapse all

`valueRep` — Value function representation
`rlValueRepresentation` object

Value function representation, specified as an rlValueRepresentation object.

`qValueRep` — Q-value function representation
`rlQValueRepresentation` object

Q-value function representation, specified as an rlQValueRepresentation object.

`obs` — Environment observations
cell array

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are M_O-by-L_B-by-L_S, where:

M_O corresponds to the dimensions of the associated observation input channel.
L_B is the batch size. To specify a single observation, set L_B = 1. To specify a batch of observations, specify L_B > 1. If valueRep or qValueRep has multiple observation input channels, then L_B must be the same for all elements of obs.
L_S specifies the sequence length for a recurrent neural network. If valueRep or qValueRep does not use a recurrent neural network, then L_S = 1. If valueRep or qValueRep has multiple observation input channels, then L_S must be the same for all elements of obs.

L_B and L_S must be the same for both act and obs.

`act` — Action
single-element cell array

Action, specified as a single-element cell array that contains an array of action values.

The dimensions of this array are M_A-by-L_B-by-L_S, where:

M_A corresponds to the dimensions of the associated action specification.
L_B is the batch size. To specify a single observation, set L_B = 1. To specify a batch of observations, specify L_B > 1.
L_S specifies the sequence length for a recurrent neural network. If valueRep or qValueRep does not use a recurrent neural network, then L_S = 1.

L_B and L_S must be the same for both act and obs.

Output Arguments

collapse all

`value` — Estimated value function
array

Estimated value function, returned as array with dimensions N-by-L_B-by-L_S, where:

N is the number of outputs of the critic network.
- For a state value representation (valueRep), N = 1.
- For a single-output state-action value representation (qValueRep), N = 1.
- For a multi-output state-action value representation (qValueRep), N is the number of discrete actions.
L_B is the batch size.
L_S is the sequence length for a recurrent neural network.

`state` — Representation state
cell array

Representation state for a recurrent neural network, returned as a cell array. If valueRep or qValueRep does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the representation to state using the setState function. For example:

valueRep = setState(valueRep,state);

Documentation

getValue

Syntax

Description

Examples

Obtain State Value Function Estimates

Obtain Multi-Output Q-Value Function Estimates

Obtain Single-Output Q-Value Function Estimates

Input Arguments

`valueRep` — Value function representation
`rlValueRepresentation` object

`qValueRep` — Q-value function representation
`rlQValueRepresentation` object

`obs` — Environment observations
cell array

`act` — Action
single-element cell array

Output Arguments

`value` — Estimated value function
array

`state` — Representation state
cell array

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

Documentation

getValue

Syntax

Description

Examples

Obtain State Value Function Estimates

Obtain Multi-Output Q-Value Function Estimates

Obtain Single-Output Q-Value Function Estimates

Input Arguments

valueRep — Value function representation rlValueRepresentation object

qValueRep — Q-value function representation rlQValueRepresentation object

obs — Environment observations cell array

act — Action single-element cell array

Output Arguments

value — Estimated value function array

state — Representation state cell array

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

`valueRep` — Value function representation
`rlValueRepresentation` object

`qValueRep` — Q-value function representation
`rlQValueRepresentation` object

`obs` — Environment observations
cell array

`act` — Action
single-element cell array

`value` — Estimated value function
array

`state` — Representation state
cell array