resubPredict

Predict resubstitution labels of naive Bayes classifier

Syntax

label = resubPredict(Mdl)

[label,Posterior,Cost]
= predict(Mdl)

Description

example

label = resubPredict(Mdl) returns a vector of predicted class labels (label) for the trained naive Bayes classifier Mdl using the predictor data Mdl.X.

example

[label,Posterior,Cost] = predict(Mdl) additionally returns posterior probabilities (Posterior) and predicted (expected) misclassification costs (Cost) corresponding to the observations (rows) in Mdl.X.

Input Arguments

expand all

`Mdl` — Fully trained naive Bayes classifier
`ClassificationNaiveBayes` model

A fully trained naive Bayes classifier, specified as a ClassificationNaiveBayes model trained by fitcnb.

Output Arguments

expand all

`label` — Predicted class labels
categorical vector | character array | logical vector | numeric vector | cell array of character vectors

Predicted class labels, returned as a categorical vector, character array, logical or numeric vector, or cell array of character vectors.

label:

Is the same data type as the observed class labels (Y) that trained Mdl. (The software treats string arrays as cell arrays of character vectors.)
Has length equal to the number of rows of X.
Is the class yielding the lowest expected misclassification cost (Cost).

`Posterior` — Class posterior probabilities
numeric matrix

Class posterior probabilities, returned as a numeric matrix. Posterior has rows equal to the number of rows of Mdl.X and columns equal to the number of distinct classes in the training data (size(Mdl.ClassNames,1)).

Posterior(j,k) is the predicted posterior probability of class k (i.e., in class Mdl.ClassNames(k)) given the observation in row j of Mdl.X.

Data Types: double

`Cost` — Expected misclassification costs
numeric matrix

Expected misclassification costs, returned as a numeric matrix. Cost has rows equal to the number of rows of Mdl.X and columns equal to the number of distinct classes in the training data (size(Mdl.ClassNames,1)).

Cost(j,k) is the expected misclassification cost of the observation in row j of Mdl.X being predicted into class k (i.e., in class Mdl.ClassNames(k)).

Examples

expand all

Label Training Sample Observations for Naive Bayes

Open Live Script

Load Fisher's iris data set.

load fisheriris
X = meas;    % Predictors
Y = species; % Response

Train a naive Bayes classifier. It is good practice to specify the class order. Assume that each predictor is conditionally, normally distributed given its label.

Mdl = fitcnb(X,Y,...
    'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationNaiveBayes classifier.

Predict the training sample labels. Display the results for the 10 observations.

label = resubPredict(Mdl);
rng(1); % For reproducibility
idx = randsample(size(X,1),10);
table(Y(idx),label(idx),'VariableNames',...
    {'TrueLabel','PredictedLabel'})

ans=10×2 table
      TrueLabel       PredictedLabel
    ______________    ______________

    {'setosa'    }    {'setosa'    }
    {'versicolor'}    {'versicolor'}
    {'virginica' }    {'virginica' }
    {'setosa'    }    {'setosa'    }
    {'versicolor'}    {'versicolor'}
    {'setosa'    }    {'setosa'    }
    {'versicolor'}    {'versicolor'}
    {'versicolor'}    {'versicolor'}
    {'setosa'    }    {'setosa'    }
    {'setosa'    }    {'setosa'    }

Estimate In-Sample Posterior Probabilities of Naive Bayes Classifiers

Open Live Script

Load Fisher's iris data set.

load fisheriris
X = meas;    % Predictors
Y = species; % Response

Train a naive Bayes classifier. It is good practice to specify the class order. Assume that each predictor is conditionally, normally distributed given its label.

Mdl = fitcnb(X,Y,...
    'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationNaiveBayes classifier.

Estimate posterior probabilities and expected misclassification costs for the training data. Display the results for 10 observations.

[label,Posterior,MisclassCost] = resubPredict(Mdl);
rng(1); % For reproducibility
idx = randsample(size(X,1),10);
Mdl.ClassNames

ans = 3x1 cell
    {'setosa'    }
    {'versicolor'}
    {'virginica' }

table(Y(idx),label(idx),Posterior(idx,:),'VariableNames',...
    {'TrueLabel','PredictedLabel','PosteriorProbability'})

ans=10×3 table
      TrueLabel       PredictedLabel              PosteriorProbability           
    ______________    ______________    _________________________________________

    {'setosa'    }    {'setosa'    }              1     3.8821e-16     5.5878e-24
    {'versicolor'}    {'versicolor'}     1.2516e-54              1     4.5001e-06
    {'virginica' }    {'virginica' }    5.5646e-188     0.00058232        0.99942
    {'setosa'    }    {'setosa'    }              1     4.5352e-20     3.1301e-27
    {'versicolor'}    {'versicolor'}     5.0002e-69        0.99989     0.00010716
    {'setosa'    }    {'setosa'    }              1     2.9813e-18     2.1524e-25
    {'versicolor'}    {'versicolor'}     4.6313e-60        0.99999     7.5413e-06
    {'versicolor'}    {'versicolor'}    7.9205e-100        0.94293       0.057072
    {'setosa'    }    {'setosa'    }              1      1.799e-19     6.0606e-27
    {'setosa'    }    {'setosa'    }              1     1.5426e-17     1.2744e-24

MisclassCost(idx,:)

ans = 10×3

    0.0000    1.0000    1.0000
    1.0000    0.0000    1.0000
    1.0000    0.9994    0.0006
    0.0000    1.0000    1.0000
    1.0000    0.0001    0.9999
    0.0000    1.0000    1.0000
    1.0000    0.0000    1.0000
    1.0000    0.0571    0.9429
    0.0000    1.0000    1.0000
    0.0000    1.0000    1.0000

The order of the columns of Posterior and MisclassCost corresponds to the order of the classes in Mdl.ClassNames.

More About

expand all

Misclassification Cost

A misclassification cost is the relative severity of a classifier labeling an observation into the wrong class.

There are two types of misclassification costs: true and expected. Let K be the number of classes.

True misclassification cost — A K-by-K matrix, where element (i,j) indicates the misclassification cost of predicting an observation into class j if its true class is i. The software stores the misclassification cost in the property Mdl.Cost, and used in computations. By default, Mdl.Cost(i,j) = 1 if i ≠ j, and Mdl.Cost(i,j) = 0 if i = j. In other words, the cost is 0 for correct classification, and 1 for any incorrect classification.
Expected misclassification cost — A K-dimensional vector, where element k is the weighted average misclassification cost of classifying an observation into class k, weighted by the class posterior probabilities. In other words,

$c_{k} = \sum_{j = 1}^{K} \hat{P} (Y = j | x_{1}, ..., x_{P}) C o s t_{j k} .$

the software classifies observations to the class corresponding with the lowest expected misclassification cost.

Posterior Probability

The posterior probability is the probability that an observation belongs in a particular class, given the data.

For naive Bayes, the posterior probability that a classification is k for a given observation (x₁,...,x_P) is

$\hat{P} (Y = k | x_{1}, .., x_{P}) = \frac{P (X_{1}, ..., X_{P} | y = k) π (Y = k)}{P (X_{1}, ..., X_{P})},$

where:

$P (X_{1}, ..., X_{P} | y = k)$ is the conditional joint density of the predictors given they are in class k. Mdl.DistributionNames stores the distribution names of the predictors.
π(Y = k) is the class prior probability distribution. Mdl.Prior stores the prior distribution.
$P (X_{1}, .., X_{P})$ is the joint density of the predictors. The classes are discrete, so $P (X_{1}, ..., X_{P}) = \sum_{k = 1}^{K} P (X_{1}, ..., X_{P} | y = k) π (Y = k) .$

Prior Probability

The prior probability of a class is the believed relative frequency with which observations from that class occur in a population.

References

[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.

Documentation

resubPredict

Syntax

Description

Input Arguments

`Mdl` — Fully trained naive Bayes classifier
`ClassificationNaiveBayes` model

Output Arguments

`label` — Predicted class labels
categorical vector | character array | logical vector | numeric vector | cell array of character vectors

`Posterior` — Class posterior probabilities
numeric matrix

`Cost` — Expected misclassification costs
numeric matrix

Examples

Label Training Sample Observations for Naive Bayes

Estimate In-Sample Posterior Probabilities of Naive Bayes Classifiers

More About

Misclassification Cost

Posterior Probability

Prior Probability

References

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

Documentation

resubPredict

Syntax

Description

Input Arguments

Mdl — Fully trained naive Bayes classifier ClassificationNaiveBayes model

Output Arguments

label — Predicted class labels categorical vector | character array | logical vector | numeric vector | cell array of character vectors

Posterior — Class posterior probabilities numeric matrix

Cost — Expected misclassification costs numeric matrix

Examples

Label Training Sample Observations for Naive Bayes

Estimate In-Sample Posterior Probabilities of Naive Bayes Classifiers

More About

Misclassification Cost

Posterior Probability

Prior Probability

References

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

`Mdl` — Fully trained naive Bayes classifier
`ClassificationNaiveBayes` model

`label` — Predicted class labels
categorical vector | character array | logical vector | numeric vector | cell array of character vectors

`Posterior` — Class posterior probabilities
numeric matrix

`Cost` — Expected misclassification costs
numeric matrix