Create Markov decision process model
Create an MDP model with eight states and two possible actions.
MDP = createMDP(8,["up";"down"]);
Specify the state transitions and their associated rewards.
% State 1 Transition and Reward MDP.T(1,2,1) = 1; MDP.R(1,2,1) = 3; MDP.T(1,3,2) = 1; MDP.R(1,3,2) = 1; % State 2 Transition and Reward MDP.T(2,4,1) = 1; MDP.R(2,4,1) = 2; MDP.T(2,5,2) = 1; MDP.R(2,5,2) = 1; % State 3 Transition and Reward MDP.T(3,5,1) = 1; MDP.R(3,5,1) = 2; MDP.T(3,6,2) = 1; MDP.R(3,6,2) = 4; % State 4 Transition and Reward MDP.T(4,7,1) = 1; MDP.R(4,7,1) = 3; MDP.T(4,8,2) = 1; MDP.R(4,8,2) = 2; % State 5 Transition and Reward MDP.T(5,7,1) = 1; MDP.R(5,7,1) = 1; MDP.T(5,8,2) = 1; MDP.R(5,8,2) = 9; % State 6 Transition and Reward MDP.T(6,7,1) = 1; MDP.R(6,7,1) = 5; MDP.T(6,8,2) = 1; MDP.R(6,8,2) = 1; % State 7 Transition and Reward MDP.T(7,7,1) = 1; MDP.R(7,7,1) = 0; MDP.T(7,7,2) = 1; MDP.R(7,7,2) = 0; % State 8 Transition and Reward MDP.T(8,8,1) = 1; MDP.R(8,8,1) = 0; MDP.T(8,8,2) = 1; MDP.R(8,8,2) = 0;
Specify the terminal states of the model.
MDP.TerminalStates = ["s7";"s8"];
states
— Model statesModel states, specified as one of the following:
Positive integer — Specify the number of model states. In this case, each state
has a default name, such as "s1"
for the first state.
String vector — Specify the state names. In this case, the total number of states is equal to the length of the vector.
actions
— Model actionsModel actions, specified as one of the following:
Positive integer — Specify the number of model actions. In this case, each
action has a default name, such as "a1"
for the first
action.
String vector — Specify the action names. In this case, the total number of actions is equal to the length of the vector.
MDP
— MDP modelGenericMDP
objectMDP model, returned as a GenericMDP
object with the following
properties.
CurrentState
— Name of the current stateName of the current state, specified as a string.
States
— State namesState names, specified as a string vector with length equal to the number of states.
Actions
— Action namesAction names, specified as a string vector with length equal to the number of actions.
T
— State transition matrixState transition matrix, specified as a 3-D array, which determines the
possible movements of the agent in an environment. State transition matrix
T
is a probability matrix that indicates how likely the agent
will move from the current state s
to any possible next state
s'
by performing action a
.
T
is an
S-by-S-by-A array,
where S is the number of states and A is the
number of actions. It is given by:
The sum of the transition probabilities out from a nonterminal state
s
following a given action must sum up to one. Therefore, all
stochastic transitions out of a given state must be specified at the same
time.
For example, to indicate that in state 1
following action
4
there is an equal probability of moving to states
2
or 3
, use the
following:
MDP.T(1,[2 3],4) = [0.5 0.5];
You can also specify that, following an action, there is some probability of remaining in the same state. For example:
MDP.T(1,[1 2 3 4],1) = [0.25 0.25 0.25 0.25];
R
— Reward transition matrixReward transition matrix, specified as a 3-D array, which determines how much
reward the agent receives after performing an action in the environment.
R
has the same shape and size as state transition matrix
T
. The reward for moving from state s
to
state s'
by performing action a
is given by:
TerminalStates
— Terminal state names in the grid worldTerminal state names in the grid world, specified as a string vector of state names.
You have a modified version of this example. Do you want to open this example with your edits?