createMDP

Create Markov decision process model

collapse all in page

Syntax

MDP = createMDP(states,actions)

Description

example

MDP = createMDP(states,actions) creates a Markov decision process model with the specified states and actions.

Examples

collapse all

Create MDP Model

Open Live Script

Create an MDP model with eight states and two possible actions.

MDP = createMDP(8,["up";"down"]);

Specify the state transitions and their associated rewards.

% State 1 Transition and Reward
MDP.T(1,2,1) = 1;
MDP.R(1,2,1) = 3;
MDP.T(1,3,2) = 1;
MDP.R(1,3,2) = 1;

% State 2 Transition and Reward
MDP.T(2,4,1) = 1;
MDP.R(2,4,1) = 2;
MDP.T(2,5,2) = 1;
MDP.R(2,5,2) = 1;

% State 3 Transition and Reward
MDP.T(3,5,1) = 1;
MDP.R(3,5,1) = 2;
MDP.T(3,6,2) = 1;
MDP.R(3,6,2) = 4;

% State 4 Transition and Reward
MDP.T(4,7,1) = 1;
MDP.R(4,7,1) = 3;
MDP.T(4,8,2) = 1;
MDP.R(4,8,2) = 2;

% State 5 Transition and Reward
MDP.T(5,7,1) = 1;
MDP.R(5,7,1) = 1;
MDP.T(5,8,2) = 1;
MDP.R(5,8,2) = 9;

% State 6 Transition and Reward
MDP.T(6,7,1) = 1;
MDP.R(6,7,1) = 5;
MDP.T(6,8,2) = 1;
MDP.R(6,8,2) = 1;

% State 7 Transition and Reward
MDP.T(7,7,1) = 1;
MDP.R(7,7,1) = 0;
MDP.T(7,7,2) = 1;
MDP.R(7,7,2) = 0;

% State 8 Transition and Reward
MDP.T(8,8,1) = 1;
MDP.R(8,8,1) = 0;
MDP.T(8,8,2) = 1;
MDP.R(8,8,2) = 0;

Specify the terminal states of the model.

MDP.TerminalStates = ["s7";"s8"];

Input Arguments

collapse all

`states` — Model states
positive integer | string vector

Model states, specified as one of the following:

Positive integer — Specify the number of model states. In this case, each state has a default name, such as "s1" for the first state.
String vector — Specify the state names. In this case, the total number of states is equal to the length of the vector.

`actions` — Model actions
positive integer | string vector

Model actions, specified as one of the following:

Positive integer — Specify the number of model actions. In this case, each action has a default name, such as "a1" for the first action.
String vector — Specify the action names. In this case, the total number of actions is equal to the length of the vector.

Output Arguments

collapse all

`MDP` — MDP model
`GenericMDP` object

MDP model, returned as a GenericMDP object with the following properties.

`CurrentState` — Name of the current state
string

Name of the current state, specified as a string.

`States` — State names
string vector

State names, specified as a string vector with length equal to the number of states.

`Actions` — Action names
string vector

Action names, specified as a string vector with length equal to the number of actions.

`T` — State transition matrix
3D array

State transition matrix, specified as a 3-D array, which determines the possible movements of the agent in an environment. State transition matrix T is a probability matrix that indicates how likely the agent will move from the current state s to any possible next state s' by performing action a. T is an S-by-S-by-A array, where S is the number of states and A is the number of actions. It is given by:

$T (s, s', a) = p r o b a b i l i t y (s' | s, a) .$

The sum of the transition probabilities out from a nonterminal state s following a given action must sum up to one. Therefore, all stochastic transitions out of a given state must be specified at the same time.

For example, to indicate that in state 1 following action 4 there is an equal probability of moving to states 2 or 3, use the following:

MDP.T(1,[2 3],4) = [0.5 0.5];

You can also specify that, following an action, there is some probability of remaining in the same state. For example:

MDP.T(1,[1 2 3 4],1) = [0.25 0.25 0.25 0.25];

`R` — Reward transition matrix
3D array

Reward transition matrix, specified as a 3-D array, which determines how much reward the agent receives after performing an action in the environment. R has the same shape and size as state transition matrix T. The reward for moving from state s to state s' by performing action a is given by:

$r = R (s, s', a) .$

`TerminalStates` — Terminal state names in the grid world
string vector

Terminal state names in the grid world, specified as a string vector of state names.

Documentation

createMDP

Syntax

Description

Examples

Create MDP Model

Input Arguments

`states` — Model states
positive integer | string vector

`actions` — Model actions
positive integer | string vector

Output Arguments

`MDP` — MDP model
`GenericMDP` object

`CurrentState` — Name of the current state
string

`States` — State names
string vector

`Actions` — Action names
string vector

`T` — State transition matrix
3D array

`R` — Reward transition matrix
3D array

`TerminalStates` — Terminal state names in the grid world
string vector

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

Documentation

createMDP

Syntax

Description

Examples

Create MDP Model

Input Arguments

states — Model states positive integer | string vector

actions — Model actions positive integer | string vector

Output Arguments

MDP — MDP model GenericMDP object

CurrentState — Name of the current state string

States — State names string vector

Actions — Action names string vector

T — State transition matrix 3D array

R — Reward transition matrix 3D array

TerminalStates — Terminal state names in the grid world string vector

See Also

Topics

Reinforcement Learning Toolbox Documentation

Support

`states` — Model states
positive integer | string vector

`actions` — Model actions
positive integer | string vector

`MDP` — MDP model
`GenericMDP` object

`CurrentState` — Name of the current state
string

`States` — State names
string vector

`Actions` — Action names
string vector

`T` — State transition matrix
3D array

`R` — Reward transition matrix
3D array

`TerminalStates` — Terminal state names in the grid world
string vector