Predict labels using discriminant analysis classification model
[
also
returns:label
,score
,cost
]
= predict(Mdl
,X
)
A matrix of classification scores (score
)
indicating the likelihood that a label comes from a particular class.
For discriminant analysis, scores are posterior probabilities.
A matrix of expected
classification cost (cost
). For each observation
in X
, the predicted class label corresponds to
the minimum expected classification cost among all classes.
Mdl
— Discriminant analysis classification modelClassificationDiscriminant
model object | CompactClassificationDiscriminant
model
objectDiscriminant analysis classification model, specified as a ClassificationDiscriminant
or CompactClassificationDiscriminant
model
object returned by fitcdiscr
.
X
— Predictor data to be classifiedPredictor data to be classified, specified as a numeric matrix or table.
Each row of X
corresponds to one observation,
and each column corresponds to one variable. All predictor variables
in X
must be numeric vectors.
For a numeric matrix, the variables that compose the
columns of X
must have the same order as the
predictor variables that trained Mdl
.
For a table:
predict
does not support multi-column
variables and cell arrays other than cell arrays of character vectors.
If you trained Mdl
using a table
(for example, Tbl
), then all predictor variables
in X
must have the same variable names and data
types as those that trained Mdl
(stored in Mdl.PredictorNames
).
However, the column order of X
does not need to
correspond to the column order of Tbl
. Tbl
and X
can
contain additional variables (response variables, observation weights,
etc.), but predict
ignores them.
If you trained Mdl
using a numeric matrix, then the predictor names in
Mdl.PredictorNames
and
corresponding predictor variable names in
X
must be the same. To specify
predictor names during training, see the PredictorNames
name-value pair argument
of fitcdiscr
. X
can contain additional variables (response variables,
observation weights, etc.), but
predict
ignores them.
Data Types: table
| double
| single
label
— Predicted class labelsPredicted class labels, returned as a categorical or character array, logical or numeric vector, or cell array of character vectors.
label
:
score
— Predicted class posterior probabilitiesPredicted class posterior probabilities,
returned as a numeric matrix of size N
-by-K
. N
is
the number of observations (rows) in X
, and K
is
the number of classes (in Mdl.ClassNames
). score(i,j)
is
the posterior probability that observation i
in X
is
of class j
in Mdl.ClassNames
.
cost
— Expected classification costsExpected classification
costs, returned as a matrix of size N
-by-K
. N
is
the number of observations (rows) in X
, and K
is
the number of classes (in Mdl.ClassNames
). cost(i,j)
is
the cost of classifying row i
of X
as
class j
in Mdl.ClassNames
.
Load Fisher's iris data set. Determine the sample size.
load fisheriris
N = size(meas,1);
Partition the data into training and test sets. Hold out 10% of the data for testing.
rng(1); % For reproducibility cvp = cvpartition(N,'Holdout',0.1); idxTrn = training(cvp); % Training set indices idxTest = test(cvp); % Test set indices
Store the training data in a table.
tblTrn = array2table(meas(idxTrn,:)); tblTrn.Y = species(idxTrn);
Train a discriminant analysis model using the training set and default options.
Mdl = fitcdiscr(tblTrn,'Y');
Predict labels for the test set. You trained Mdl
using a table of data, but you can predict labels using a matrix.
labels = predict(Mdl,meas(idxTest,:));
Construct a confusion matrix for the test set.
confusionchart(species(idxTest),labels)
Mdl
misclassifies one versicolor iris as virginica in the test set.
Load Fisher's iris data set. Consider training using the petal lengths and widths only.
load fisheriris
X = meas(:,3:4);
Train a quadratic discriminant analysis model using the entire data set.
Mdl = fitcdiscr(X,species,'DiscrimType','quadratic');
Define a grid of values in the observed predictor space. Predict the posterior probabilities for each instance in the grid.
xMax = max(X); xMin = min(X); d = 0.01; [x1Grid,x2Grid] = meshgrid(xMin(1):d:xMax(1),xMin(2):d:xMax(2)); [~,score] = predict(Mdl,[x1Grid(:),x2Grid(:)]); Mdl.ClassNames
ans = 3x1 cell
{'setosa' }
{'versicolor'}
{'virginica' }
score
is a matrix of class posterior probabilities. The columns correspond to the classes in Mdl.ClassNames
. For example, score(j,1)
is the posterior probability that observation j
is a setosa iris.
Plot the posterior probability of versicolor classification for each observation in the grid and plot the training data.
figure; contourf(x1Grid,x2Grid,reshape(score(:,2),size(x1Grid,1),size(x1Grid,2))); h = colorbar; caxis([0 1]); colormap jet; hold on gscatter(X(:,1),X(:,2),species,'mcy','.x+'); axis tight title('Posterior Probability of versicolor'); hold off
The posterior probability region exposes a portion of the decision boundary.
The posterior probability that a point x belongs to class k is the product of the prior probability and the multivariate normal density. The density function of the multivariate normal with 1-by-d mean μk and d-by-d covariance Σk at a 1-by-d point x is
where is the determinant of Σk, and is the inverse matrix.
Let P(k) represent the prior probability of class k. Then the posterior probability that an observation x is of class k is
where P(x) is a normalization constant, the sum over k of P(x|k)P(k).
The prior probability is one of three choices:
'uniform'
— The prior probability
of class k
is one over the total number of classes.
'empirical'
— The prior
probability of class k
is the number of training
samples of class k
divided by the total number
of training samples.
Custom — The prior probability of class k
is
the k
th element of the prior
vector.
See fitcdiscr
.
After creating a classification model (Mdl
)
you can set the prior using dot notation:
Mdl.Prior = v;
where v
is a vector of positive elements
representing the frequency with which each element occurs. You do
not need to retrain the classifier when you set a new prior.
The matrix of expected costs per observation is defined in Cost.
predict
classifies so as to minimize the expected
classification cost:
where
is the predicted classification.
K is the number of classes.
is the posterior probability of class k for observation x.
is the cost of classifying an observation as y when its true class is k.
This function fully supports tall arrays. You can use models trained on either in-memory or tall data with this function.
For more information, see Tall Arrays.
Usage notes and limitations:
Use saveLearnerForCoder
, loadLearnerForCoder
, and codegen
(MATLAB Coder) to generate code for the predict
function. Save
a trained model by using saveLearnerForCoder
. Define an entry-point function
that loads the saved model by using loadLearnerForCoder
and calls the
predict
function. Then use codegen
to generate code for the entry-point function.
You can also generate single-precision C/C++ code for
predict
. For single-precision code generation, specify the
name-value pair argument 'DataType','single'
as an additional input to the
loadLearnerForCoder
function.
This table contains
notes about the arguments of predict
. Arguments not included in this
table are fully supported.
Argument | Notes and Limitations |
---|---|
Mdl | For the usage notes and limitations of the model object,
see
Code Generation of the
|
X |
|
For more information, see Introduction to Code Generation.
ClassificationDiscriminant
| CompactClassificationDiscriminant
| edge
| fitcdiscr
| loss
| margin
You have a modified version of this example. Do you want to open this example with your edits?