Classify observations using naive Bayes classifier
[
also returns the Posterior Probability (label
,Posterior
,Cost
]
= resubPredict(Mdl
)Posterior
) and predicted
(expected) Misclassification Cost (Cost
) corresponding to
the observations (rows) in Mdl.X
.
Load the fisheriris
data set. Create X
as a numeric matrix that contains four petal measurements for 150 irises. Create Y
as a cell array of character vectors that contains the corresponding iris species.
load fisheriris X = meas; Y = species; rng('default') % for reproducibility
Train a naive Bayes classifier using the predictors X
and class labels Y
. A recommended practice is to specify the class names. fitcnb
assumes that each predictor is conditionally and normally distributed.
Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'})
Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3x4 cell} Properties, Methods
Mdl
is a trained ClassificationNaiveBayes
classifier.
Predict the training sample labels.
label = resubPredict(Mdl);
Display the results for a random set of 10 observations.
idx = randsample(size(X,1),10); table(Y(idx),label(idx),'VariableNames', ... {'TrueLabel','PredictedLabel'})
ans=10×2 table
TrueLabel PredictedLabel
______________ ______________
{'virginica' } {'virginica' }
{'setosa' } {'setosa' }
{'virginica' } {'virginica' }
{'versicolor'} {'versicolor'}
{'virginica' } {'virginica' }
{'versicolor'} {'versicolor'}
{'virginica' } {'virginica' }
{'setosa' } {'setosa' }
{'virginica' } {'virginica' }
{'setosa' } {'setosa' }
Create a confusion chart from the true labels Y
and the predicted labels label
.
cm = confusionchart(Y,label);
Estimate in-sample posterior probabilities and misclassification costs using a naive Bayes classifier.
Load the fisheriris
data set. Create X
as a numeric matrix that contains four petal measurements for 150 irises. Create Y
as a cell array of character vectors that contains the corresponding iris species.
load fisheriris X = meas; Y = species; rng('default') %for reproducibility
Train a naive Bayes classifier using the predictors X
and class labels Y
. A recommended practice is to specify the class names. fitcnb
assumes that each predictor is conditionally and normally distributed.
Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'});
Mdl
is a trained ClassificationNaiveBayes
classifier.
Estimate the posterior probabilities and expected misclassification costs for the training data.
[label,Posterior,MisclassCost] = resubPredict(Mdl); Mdl.ClassNames
ans = 3x1 cell
{'setosa' }
{'versicolor'}
{'virginica' }
Display the results for 10 randomly selected observations.
idx = randsample(size(X,1),10); table(Y(idx),label(idx),Posterior(idx,:),MisclassCost(idx,:),'VariableNames', ... {'TrueLabel','PredictedLabel','PosteriorProbability','MisclassificationCost'})
ans=10×4 table
TrueLabel PredictedLabel PosteriorProbability MisclassificationCost
______________ ______________ _________________________________________ ______________________________________
{'virginica' } {'virginica' } 6.2514e-269 1.1709e-09 1 1 1 1.1709e-09
{'setosa' } {'setosa' } 1 5.5339e-19 2.485e-25 5.5339e-19 1 1
{'virginica' } {'virginica' } 7.4191e-249 1.4481e-10 1 1 1 1.4481e-10
{'versicolor'} {'versicolor'} 3.4472e-62 0.99997 3.362e-05 1 3.362e-05 0.99997
{'virginica' } {'virginica' } 3.4268e-229 6.597e-09 1 1 1 6.597e-09
{'versicolor'} {'versicolor'} 6.0941e-77 0.9998 0.00019663 1 0.00019663 0.9998
{'virginica' } {'virginica' } 1.3467e-167 0.002187 0.99781 1 0.99781 0.002187
{'setosa' } {'setosa' } 1 1.5776e-15 5.7172e-24 1.5776e-15 1 1
{'virginica' } {'virginica' } 2.0116e-232 2.6206e-10 1 1 1 2.6206e-10
{'setosa' } {'setosa' } 1 1.8085e-17 1.9639e-24 1.8085e-17 1 1
The order of the columns of Posterior
and MisclassCost
corresponds to the order of the classes in Mdl.ClassNames
.
Mdl
— Full, trained naive Bayes classifierClassificationNaiveBayes
modelFull, trained naive Bayes classifier, specified as a ClassificationNaiveBayes
model trained by fitcnb
.
label
— Predicted class labelsPredicted class labels, returned as a categorical vector, character array, logical or numeric vector, or cell array of character vectors.
The predicted class labels have the following:
Same data type as the observed class labels (Mdl.Y
). (The software treats string arrays as cell arrays of character
vectors.)
Length equal to the number of rows of Mdl.X
.
Class yielding the lowest expected misclassification cost (Cost
).
Posterior
— Class posterior probabilityClass Posterior Probability, returned as a numeric matrix.
Posterior
has rows equal to the number of rows of
Mdl.X
and columns equal to the number of distinct classes in the
training data (size(Mdl.ClassNames,1)
).
Posterior(j,k)
is the predicted posterior probability of class
k
(in class Mdl.ClassNames(k)
) given the
observation in row j
of Mdl.X
.
Cost
— Expected misclassification costsExpected Misclassification Cost, returned as a numeric matrix.
Cost
has rows equal to the number of rows of
Mdl.X
and columns equal to the number of distinct classes in the
training data (size(Mdl.ClassNames,1)
).
Cost(j,k)
is the expected misclassification cost of the observation in row
j
of Mdl.X
predicted into class
k
(in class Mdl.ClassNames(k)
).
A misclassification cost is the relative severity of a classifier labeling an observation into the wrong class.
There are two types of misclassification costs: true and expected. Let K be the number of classes.
True misclassification cost — A
K-by-K matrix, where element
(i,j) indicates the misclassification
cost of predicting an observation into class j if its true
class is i. The software stores the misclassification cost in
the property Mdl.Cost
, and uses it in computations. By
default, Mdl.Cost(i,j)
= 1 if i
≠
j
, and Mdl.Cost(i,j)
= 0 if
i
= j
. In other words, the cost is
0
for correct classification and 1
for
any incorrect classification.
Expected misclassification cost — A K-dimensional vector, where element k is the weighted average misclassification cost of classifying an observation into class k, weighted by the class posterior probabilities.
In other words, the software classifies observations to the class corresponding with the lowest expected misclassification cost.
The posterior probability is the probability that an observation belongs in a particular class, given the data.
For naive Bayes, the posterior probability that a classification is k for a given observation (x1,...,xP) is
where:
is the conditional
joint density of the predictors given they are in class k. Mdl.DistributionNames
stores
the distribution names of the predictors.
π(Y = k)
is the class prior probability distribution. Mdl.Prior
stores
the prior distribution.
is the joint density of the predictors. The classes are discrete, so
The prior probability of a class is the assumed relative frequency with which observations from that class occur in a population.
ClassificationNaiveBayes
| CompactClassificationNaiveBayes
| fitcnb
| loss
| predict
You have a modified version of this example. Do you want to open this example with your edits?