getMaxQValue

Obtain maximum state-value function estimate for Q-value function representation with discrete action space

Syntax

[maxQ,maxActionIndex] = getMaxQValue(qValueRep,obs)

[maxQ,maxActionIndex,state] = getMaxQValue(___)

Description

[maxQ,maxActionIndex] = getMaxQValue(qValueRep,obs) returns the maximum estimated state-value function for Q-value function representation qValueRep given environment observations obs. getMaxQValue determines the discrete action for which the Q-value estimate is greatest and returns that Q value (maxQ) and the corresponding action index (maxActionIndex).

[maxQ,maxActionIndex,state] = getMaxQValue(___) returns the state of the representation. Use this syntax when qValueRep is a recurrent neural network.

Examples

collapse all

Obtain Maximum Q-Value Function Estimates

Open Live Script

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a deep neural network for a multi-output Q-value function representation.

criticNetwork = [
    featureInputLayer(4,'Normalization','none','Name','state')
    fullyConnectedLayer(50, 'Name', 'CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(20,'Name','CriticStateFC2')
    reluLayer('Name','CriticRelu2')
    fullyConnectedLayer(numDiscreteAct,'Name','output')];

Create a representation for your critic using the recurrent neural network.

criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation','state',criticOptions);

Obtain value function estimates for each possible discrete action using random observations.

obs = rand(4,1);
val = getValue(critic,{obs})

val = 2x1 single column vector

    0.0139
   -0.1851

val contains two value function estimates, one for each possible discrete action.

You can obtain the maximum Q-value function estimate across all the discrete actions.

[maxVal,maxIndex] = getMaxQValue(critic,{obs})

maxVal = single
    0.0139

maxIndex = 1

maxVal corresponds to the maximum entry in val.

You can also obtain maximum Q-value function estimates for a batch of observations. For example, obtain value function estimates for a batch of 10 observations.

[batchVal,batchIndex] = getMaxQValue(critic,{rand(4,1,10)});

Input Arguments

collapse all

`qValueRep` — Q-value representation
`rlQValueRepresentation` object

Q-value representation, specified as an rlQValueRepresentation object.

`obs` — Environment observations
cell array

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are M_O-by-L_B-by-L_S, where:

M_O corresponds to the dimensions of the associated observation input channel.
L_B is the batch size. To specify a single observation, set L_B = 1. To specify a batch of observations, specify L_B > 1. If valueRep or qValueRep has multiple observation input channels, then L_B must be the same for all elements of obs.
L_S specifies the sequence length for a recurrent neural network. If valueRep or qValueRep does not use a recurrent neural network, then L_S = 1. If valueRep or qValueRep has multiple observation input channels, then L_S must be the same for all elements of obs.

L_B and L_S must be the same for both act and obs.

Output Arguments

collapse all

`maxQ` — Maximum Q-value estimate
array

Maximum Q-value estimate across all possible discrete actions, returned as a 1-by-L_B-by-L_S array, where:

L_B is the batch size.
L_S specifies the sequence length for a recurrent neural network. If qValueRep does not use a recurrent neural network, then L_S = 1.

`maxActionIndex` — Action index
array

Action index corresponding to the maximum Q value, returned as a 1-by-L_B-by-L_S array, where:

L_B is the batch size.
L_S specifies the sequence length for a recurrent neural network. If qValueRep does not use a recurrent neural network, then L_S = 1.

`state` — Representation state
cell array

Representation state, returned as a cell array. If qValueRep does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the representation to state using the setState function. For example:

valueRep = setState(qValueRep,state);

Documentation

getMaxQValue

Syntax

Description

Examples

Obtain Maximum Q-Value Function Estimates

Input Arguments

`qValueRep` — Q-value representation
`rlQValueRepresentation` object

`obs` — Environment observations
cell array

Output Arguments

`maxQ` — Maximum Q-value estimate
array

`maxActionIndex` — Action index
array

`state` — Representation state
cell array

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

Documentation

getMaxQValue

Syntax

Description

Examples

Obtain Maximum Q-Value Function Estimates

Input Arguments

qValueRep — Q-value representation rlQValueRepresentation object

obs — Environment observations cell array

Output Arguments

maxQ — Maximum Q-value estimate array

maxActionIndex — Action index array

state — Representation state cell array

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

`qValueRep` — Q-value representation
`rlQValueRepresentation` object

`obs` — Environment observations
cell array

`maxQ` — Maximum Q-value estimate
array

`maxActionIndex` — Action index
array

`state` — Representation state
cell array