generatePolicyFunction

Create function that evaluates trained policy of reinforcement learning agent

Syntax

generatePolicyFunction(agent)

generatePolicyFunction(agent,Name,Value)

Description

generatePolicyFunction(agent) creates a function that evaluates the learned policy of the specified agent using the default function, policy, and data file names. After generating the policy evaluation function, you can:

Generate code for the function using MATLAB^® Coder™ or GPU Coder™. For more information, see Deploy Trained Reinforcement Learning Policies.
Simulate the trained agent in Simulink^® using a MATLAB Function (Simulink) block.

example

generatePolicyFunction(agent,Name,Value) specifies the function, policy, and data file names using one or more name-value pair arguments.

Examples

collapse all

Create Policy Evaluation Function for PG Agent

Open Live Script

This example shows how to create a policy evaluation function for a PG Agent.

First, create and train a reinforcement learning agent. For this example, load the PG agent trained in Train PG Agent to Balance Cart-Pole System:

load('MATLABCartpolePG.mat','agent')

Then, create a policy evaluation function for this agent using default names:

generatePolicyFunction(agent);

This command creates the evaluatePolicy.m file, which contains the policy function, and the agentData.mat file, which contains the trained deep neural network actor.

View the generated function.

type evaluatePolicy.m

function action1 = evaluatePolicy(observation1)
%#codegen

% Reinforcement Learning Toolbox
% Generated on: 20-Aug-2020 17:00:53

actionSet = [-10 10];
% Select action from sampled probabilities
probabilities = localEvaluate(observation1);
% Normalize the probabilities
p = probabilities(:)'/sum(probabilities);
% Determine which action to take
edges = min([0 cumsum(p)],1);
edges(end) = 1;
[~,actionIndex] = histc(rand(1,1),edges); %#ok<HISTC>
action1 = actionSet(actionIndex);
end
%% Local Functions
function probabilities = localEvaluate(observation1)
persistent policy
if isempty(policy)
	policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
observation1 = observation1(:);
probabilities = predict(policy,observation1);
end

For a given observation, the policy function evaluates a probability for each potential action using the actor network. Then, the policy function randomly selects an action based on these probabilities.

Since the actor network for this PG agent has a single input layer and single output layer, you can generate code for this network using the Deep Learning Toolbox™ generation functionality. For more information, see Deploy Trained Reinforcement Learning Policies.

Create Policy Evaluation Function for Q-Learning Agent

Open Live Script

This example shows how to create a policy evaluation function for a Q-Learning Agent.

For this example, load the Q-learning agent trained in Train Reinforcement Learning Agent in Basic Grid World

load('basicGWQAgent.mat','qAgent')

Create a policy evaluation function for this agent and specify the name of the agent data file.

generatePolicyFunction(qAgent,'MATFileName',"policyFile.mat")

This command creates the evaluatePolicy.m file, which contains the policy function, and the policyFile.mat file, which contains the trained Q table value function.

View the generated function.

type evaluatePolicy.m

function action1 = evaluatePolicy(observation1)
%#codegen

% Reinforcement Learning Toolbox
% Generated on: 20-Aug-2020 17:00:54

actionSet = [1;2;3;4];
numActions = numel(actionSet);
q = zeros(1,numActions);
for i = 1:numActions
	q(i) = localEvaluate(observation1,actionSet(i));
end
[~,actionIndex] = max(q);
action1 = actionSet(actionIndex);
end
%% Local Functions
function q = localEvaluate(observation1,action)
persistent policy
if isempty(policy)
	s = coder.load('policyFile.mat','policy');
	policy = s.policy;
end
actionSet = [1;2;3;4];
observationSet = [1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25];
actionIndex = rl.codegen.getElementIndex(actionSet,action);
observationIndex = rl.codegen.getElementIndex(observationSet,observation1);
q = policy(observationIndex,actionIndex);
end

For a given observation, the policy function looks up the value function for each potential action using the Q table. Then, the policy function selects the action for which the value function is greatest.

You can generate code for this policy function using MATLAB® Coder™

For more information, see Deploy Trained Reinforcement Learning Policies

Input Arguments

collapse all

`agent` — Trained reinforcement learning agent
reinforcement learning agent object

Trained reinforcement learning agent, specified as one of the following:

rlQAgent object
rlSARSAAgent object
rlDDPGAgent object
rlTD3Agent object
rlACAgent object
rlPGAgent object that estimates a baseline value function using a critic

Since Deep Learning Toolbox™ code generation and prediction functionality do not support deep neural networks with more than one input layer, generatePolicyFunction does not support the following agent configurations.

DQN agent with single-output deep neural network critic representations
Any agent with deep neural network actor or critic representations with multiple observation input layers

Note

DQN agents with a multi-output deep neural network representation are supported by generatePolicyFunction, provided that the network has only one input layer for the observations.

To train your agent, use the train function.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'FunctionName',"computeAction"

`'FunctionName'` — Name of the generated function
`'evaluatePolicy'` (default) | string | character vector

Name of the generated function, specified as the name-value pair consisting of 'FunctionName' and a string or character vector.

`'PolicyName'` — Name of the policy variable within the generated function
`'policy'` (default) | string | character vector

Name of the policy variable within the generated function, specified as the name-value pair consisting of 'PolicyName' and a string or character vector.

`'MATFileName'` — Name of agent data file
`'agentData'` (default) | string | character vector

Name of the agent data file, specified as the name-value pair consisting of 'MATFileName' and a string or character vector.

Documentation

generatePolicyFunction

Syntax

Description

Examples

Create Policy Evaluation Function for PG Agent

Create Policy Evaluation Function for Q-Learning Agent

Input Arguments

`agent` — Trained reinforcement learning agent
reinforcement learning agent object

Name-Value Pair Arguments

`'FunctionName'` — Name of the generated function
`'evaluatePolicy'` (default) | string | character vector

`'PolicyName'` — Name of the policy variable within the generated function
`'policy'` (default) | string | character vector

`'MATFileName'` — Name of agent data file
`'agentData'` (default) | string | character vector

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

Documentation

generatePolicyFunction

Syntax

Description

Examples

Create Policy Evaluation Function for PG Agent

Create Policy Evaluation Function for Q-Learning Agent

Input Arguments

agent — Trained reinforcement learning agent reinforcement learning agent object

Name-Value Pair Arguments

'FunctionName' — Name of the generated function 'evaluatePolicy' (default) | string | character vector

'PolicyName' — Name of the policy variable within the generated function 'policy' (default) | string | character vector

'MATFileName' — Name of agent data file 'agentData' (default) | string | character vector

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

`agent` — Trained reinforcement learning agent
reinforcement learning agent object

`'FunctionName'` — Name of the generated function
`'evaluatePolicy'` (default) | string | character vector

`'PolicyName'` — Name of the policy variable within the generated function
`'policy'` (default) | string | character vector

`'MATFileName'` — Name of agent data file
`'agentData'` (default) | string | character vector