Class: ClassificationNaiveBayes
Predict resubstitution labels of naive Bayes classifier
[
additionally returns posterior probabilities (label
,Posterior
,Cost
]
= predict(Mdl
)Posterior
)
and predicted (expected) misclassification costs (Cost
)
corresponding to the observations (rows) in Mdl.X
.
Mdl
— Fully trained naive Bayes classifierClassificationNaiveBayes
modelA fully trained naive Bayes classifier, specified as a ClassificationNaiveBayes
model
trained by fitcnb
.
label
— Predicted class labelsPredicted class labels, returned as a categorical vector, character array, logical or numeric vector, or cell array of character vectors.
label
:
Posterior
— Class posterior probabilitiesClass posterior
probabilities, returned as a numeric matrix. Posterior
has
rows equal to the number of rows of Mdl.X
and
columns equal to the number of distinct classes in the training data
(size(Mdl.ClassNames,1)
).
Posterior(j,k)
is the predicted posterior
probability of class k
(i.e., in class Mdl.ClassNames(k)
)
given the observation in row j
of Mdl.X
.
Data Types: double
Cost
— Expected misclassification costsExpected misclassification
costs, returned as a numeric matrix. Cost
has
rows equal to the number of rows of Mdl.X
and
columns equal to the number of distinct classes in the training data
(size(Mdl.ClassNames,1)
).
Cost(j,k)
is the expected misclassification
cost of the observation in row j
of Mdl.X
being
predicted into class k
(i.e., in class Mdl.ClassNames(k)
).
Load Fisher's iris data set.
load fisheriris X = meas; % Predictors Y = species; % Response
Train a naive Bayes classifier. It is good practice to specify the class order. Assume that each predictor is conditionally, normally distributed given its label.
Mdl = fitcnb(X,Y,... 'ClassNames',{'setosa','versicolor','virginica'});
Mdl
is a ClassificationNaiveBayes
classifier.
Predict the training sample labels. Display the results for the 10 observations.
label = resubPredict(Mdl); rng(1); % For reproducibility idx = randsample(size(X,1),10); table(Y(idx),label(idx),'VariableNames',... {'TrueLabel','PredictedLabel'})
ans=10×2 table
TrueLabel PredictedLabel
______________ ______________
{'setosa' } {'setosa' }
{'versicolor'} {'versicolor'}
{'virginica' } {'virginica' }
{'setosa' } {'setosa' }
{'versicolor'} {'versicolor'}
{'setosa' } {'setosa' }
{'versicolor'} {'versicolor'}
{'versicolor'} {'versicolor'}
{'setosa' } {'setosa' }
{'setosa' } {'setosa' }
Load Fisher's iris data set.
load fisheriris X = meas; % Predictors Y = species; % Response
Train a naive Bayes classifier. It is good practice to specify the class order. Assume that each predictor is conditionally, normally distributed given its label.
Mdl = fitcnb(X,Y,... 'ClassNames',{'setosa','versicolor','virginica'});
Mdl
is a ClassificationNaiveBayes
classifier.
Estimate posterior probabilities and expected misclassification costs for the training data. Display the results for 10 observations.
[label,Posterior,MisclassCost] = resubPredict(Mdl);
rng(1); % For reproducibility
idx = randsample(size(X,1),10);
Mdl.ClassNames
ans = 3x1 cell
{'setosa' }
{'versicolor'}
{'virginica' }
table(Y(idx),label(idx),Posterior(idx,:),'VariableNames',... {'TrueLabel','PredictedLabel','PosteriorProbability'})
ans=10×3 table
TrueLabel PredictedLabel PosteriorProbability
______________ ______________ _________________________________________
{'setosa' } {'setosa' } 1 3.8821e-16 5.5878e-24
{'versicolor'} {'versicolor'} 1.2516e-54 1 4.5001e-06
{'virginica' } {'virginica' } 5.5646e-188 0.00058232 0.99942
{'setosa' } {'setosa' } 1 4.5352e-20 3.1301e-27
{'versicolor'} {'versicolor'} 5.0002e-69 0.99989 0.00010716
{'setosa' } {'setosa' } 1 2.9813e-18 2.1524e-25
{'versicolor'} {'versicolor'} 4.6313e-60 0.99999 7.5413e-06
{'versicolor'} {'versicolor'} 7.9205e-100 0.94293 0.057072
{'setosa' } {'setosa' } 1 1.799e-19 6.0606e-27
{'setosa' } {'setosa' } 1 1.5426e-17 1.2744e-24
MisclassCost(idx,:)
ans = 10×3
0.0000 1.0000 1.0000
1.0000 0.0000 1.0000
1.0000 0.9994 0.0006
0.0000 1.0000 1.0000
1.0000 0.0001 0.9999
0.0000 1.0000 1.0000
1.0000 0.0000 1.0000
1.0000 0.0571 0.9429
0.0000 1.0000 1.0000
0.0000 1.0000 1.0000
The order of the columns of Posterior
and MisclassCost
corresponds to the order of the classes in Mdl.ClassNames
.
A misclassification cost is the relative severity of a classifier labeling an observation into the wrong class.
There are two types of misclassification costs: true and expected. Let K be the number of classes.
True misclassification cost —
A K-by-K matrix, where element
(i,j) indicates the misclassification
cost of predicting an observation into class j if
its true class is i. The software stores the misclassification
cost in the property Mdl.Cost
, and used in computations.
By default, Mdl.Cost(i,j)
= 1 if i
≠ j
,
and Mdl.Cost(i,j)
= 0 if i
= j
.
In other words, the cost is 0
for correct classification,
and 1
for any incorrect classification.
Expected misclassification cost — A K-dimensional vector, where element k is the weighted average misclassification cost of classifying an observation into class k, weighted by the class posterior probabilities. In other words,
the software classifies observations to the class corresponding with the lowest expected misclassification cost.
The posterior probability is the probability that an observation belongs in a particular class, given the data.
For naive Bayes, the posterior probability that a classification is k for a given observation (x1,...,xP) is
where:
is the conditional
joint density of the predictors given they are in class k. Mdl.DistributionNames
stores
the distribution names of the predictors.
π(Y = k)
is the class prior probability distribution. Mdl.Prior
stores
the prior distribution.
is the joint density of the predictors. The classes are discrete, so
The prior probability of a class is the believed relative frequency with which observations from that class occur in a population.
[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.
You have a modified version of this example. Do you want to open this example with your edits?