resubPredict

Classify observations using naive Bayes classifier

Syntax

label = resubPredict(Mdl)

[label,Posterior,Cost]
= resubPredict(Mdl)

Description

example

label = resubPredict(Mdl) returns a vector of resubstitution predicted class labels (label) for the trained naive Bayes classifier Mdl using the predictor data Mdl.X.

example

[label,Posterior,Cost] = resubPredict(Mdl) also returns the Posterior Probability (Posterior) and predicted (expected) Misclassification Cost (Cost) corresponding to the observations (rows) in Mdl.X.

Examples

collapse all

Label Training Sample Observations of Naive Bayes Classifier

Open Live Script

Load the fisheriris data set. Create X as a numeric matrix that contains four petal measurements for 150 irises. Create Y as a cell array of character vectors that contains the corresponding iris species.

load fisheriris
X = meas;
Y = species;
rng('default')  % for reproducibility

Train a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed.

Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'})

Mdl = 
  ClassificationNaiveBayes
              ResponseName: 'Y'
     CategoricalPredictors: []
                ClassNames: {'setosa'  'versicolor'  'virginica'}
            ScoreTransform: 'none'
           NumObservations: 150
         DistributionNames: {'normal'  'normal'  'normal'  'normal'}
    DistributionParameters: {3x4 cell}


  Properties, Methods

Mdl is a trained ClassificationNaiveBayes classifier.

Predict the training sample labels.

label = resubPredict(Mdl);

Display the results for a random set of 10 observations.

idx = randsample(size(X,1),10);
table(Y(idx),label(idx),'VariableNames', ...
    {'TrueLabel','PredictedLabel'})

ans=10×2 table
      TrueLabel       PredictedLabel
    ______________    ______________

    {'virginica' }    {'virginica' }
    {'setosa'    }    {'setosa'    }
    {'virginica' }    {'virginica' }
    {'versicolor'}    {'versicolor'}
    {'virginica' }    {'virginica' }
    {'versicolor'}    {'versicolor'}
    {'virginica' }    {'virginica' }
    {'setosa'    }    {'setosa'    }
    {'virginica' }    {'virginica' }
    {'setosa'    }    {'setosa'    }

Create a confusion chart from the true labels Y and the predicted labels label.

cm = confusionchart(Y,label);

Estimate In-Sample Posterior Probabilities and Misclassification Costs

Open Live Script

Estimate in-sample posterior probabilities and misclassification costs using a naive Bayes classifier.

load fisheriris
X = meas;
Y = species;
rng('default')  %for reproducibility

Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a trained ClassificationNaiveBayes classifier.

Estimate the posterior probabilities and expected misclassification costs for the training data.

[label,Posterior,MisclassCost] = resubPredict(Mdl);
Mdl.ClassNames

ans = 3x1 cell
    {'setosa'    }
    {'versicolor'}
    {'virginica' }

Display the results for 10 randomly selected observations.

idx = randsample(size(X,1),10);
table(Y(idx),label(idx),Posterior(idx,:),MisclassCost(idx,:),'VariableNames', ...
    {'TrueLabel','PredictedLabel','PosteriorProbability','MisclassificationCost'})

ans=10×4 table
      TrueLabel       PredictedLabel              PosteriorProbability                       MisclassificationCost         
    ______________    ______________    _________________________________________    ______________________________________

    {'virginica' }    {'virginica' }    6.2514e-269     1.1709e-09              1             1             1    1.1709e-09
    {'setosa'    }    {'setosa'    }              1     5.5339e-19      2.485e-25    5.5339e-19             1             1
    {'virginica' }    {'virginica' }    7.4191e-249     1.4481e-10              1             1             1    1.4481e-10
    {'versicolor'}    {'versicolor'}     3.4472e-62        0.99997      3.362e-05             1     3.362e-05       0.99997
    {'virginica' }    {'virginica' }    3.4268e-229      6.597e-09              1             1             1     6.597e-09
    {'versicolor'}    {'versicolor'}     6.0941e-77         0.9998     0.00019663             1    0.00019663        0.9998
    {'virginica' }    {'virginica' }    1.3467e-167       0.002187        0.99781             1       0.99781      0.002187
    {'setosa'    }    {'setosa'    }              1     1.5776e-15     5.7172e-24    1.5776e-15             1             1
    {'virginica' }    {'virginica' }    2.0116e-232     2.6206e-10              1             1             1    2.6206e-10
    {'setosa'    }    {'setosa'    }              1     1.8085e-17     1.9639e-24    1.8085e-17             1             1

The order of the columns of Posterior and MisclassCost corresponds to the order of the classes in Mdl.ClassNames.

Input Arguments

collapse all

`Mdl` — Full, trained naive Bayes classifier
`ClassificationNaiveBayes` model

Full, trained naive Bayes classifier, specified as a ClassificationNaiveBayes model trained by fitcnb.

Output Arguments

collapse all

`label` — Predicted class labels
categorical vector | character array | logical vector | numeric vector | cell array of character vectors

Predicted class labels, returned as a categorical vector, character array, logical or numeric vector, or cell array of character vectors.

The predicted class labels have the following:

Same data type as the observed class labels (Mdl.Y). (The software treats string arrays as cell arrays of character vectors.)
Length equal to the number of rows of Mdl.X.
Class yielding the lowest expected misclassification cost (Cost).

`Posterior` — Class posterior probability
numeric matrix

Class Posterior Probability, returned as a numeric matrix. Posterior has rows equal to the number of rows of Mdl.X and columns equal to the number of distinct classes in the training data (size(Mdl.ClassNames,1)).

Posterior(j,k) is the predicted posterior probability of class k (in class Mdl.ClassNames(k)) given the observation in row j of Mdl.X.

`Cost` — Expected misclassification costs
numeric matrix

Expected Misclassification Cost, returned as a numeric matrix. Cost has rows equal to the number of rows of Mdl.X and columns equal to the number of distinct classes in the training data (size(Mdl.ClassNames,1)).

Cost(j,k) is the expected misclassification cost of the observation in row j of Mdl.X predicted into class k (in class Mdl.ClassNames(k)).

More About

collapse all

Misclassification Cost

A misclassification cost is the relative severity of a classifier labeling an observation into the wrong class.

There are two types of misclassification costs: true and expected. Let K be the number of classes.

True misclassification cost — A K-by-K matrix, where element (i,j) indicates the misclassification cost of predicting an observation into class j if its true class is i. The software stores the misclassification cost in the property Mdl.Cost, and uses it in computations. By default, Mdl.Cost(i,j) = 1 if i ≠ j, and Mdl.Cost(i,j) = 0 if i = j. In other words, the cost is 0 for correct classification and 1 for any incorrect classification.
Expected misclassification cost — A K-dimensional vector, where element k is the weighted average misclassification cost of classifying an observation into class k, weighted by the class posterior probabilities.

$c_{k} = \sum_{j = 1}^{K} \hat{P} (Y = j | x_{1}, ..., x_{P}) C o s t_{j k} .$
In other words, the software classifies observations to the class corresponding with the lowest expected misclassification cost.

Posterior Probability

The posterior probability is the probability that an observation belongs in a particular class, given the data.

For naive Bayes, the posterior probability that a classification is k for a given observation (x₁,...,x_P) is

$\hat{P} (Y = k | x_{1}, .., x_{P}) = \frac{P (X_{1}, ..., X_{P} | y = k) π (Y = k)}{P (X_{1}, ..., X_{P})},$

where:

$P (X_{1}, ..., X_{P} | y = k)$ is the conditional joint density of the predictors given they are in class k. Mdl.DistributionNames stores the distribution names of the predictors.
π(Y = k) is the class prior probability distribution. Mdl.Prior stores the prior distribution.
$P (X_{1}, .., X_{P})$ is the joint density of the predictors. The classes are discrete, so $P (X_{1}, ..., X_{P}) = \sum_{k = 1}^{K} P (X_{1}, ..., X_{P} | y = k) π (Y = k) .$

Prior Probability

The prior probability of a class is the assumed relative frequency with which observations from that class occur in a population.

Documentation

resubPredict

Syntax

Description

Examples

Label Training Sample Observations of Naive Bayes Classifier

Estimate In-Sample Posterior Probabilities and Misclassification Costs

Input Arguments

`Mdl` — Full, trained naive Bayes classifier
`ClassificationNaiveBayes` model

Output Arguments

`label` — Predicted class labels
categorical vector | character array | logical vector | numeric vector | cell array of character vectors

`Posterior` — Class posterior probability
numeric matrix

`Cost` — Expected misclassification costs
numeric matrix

More About

Misclassification Cost

Posterior Probability

Prior Probability

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

Documentation

resubPredict

Syntax

Description

Examples

Label Training Sample Observations of Naive Bayes Classifier

Estimate In-Sample Posterior Probabilities and Misclassification Costs

Input Arguments

Mdl — Full, trained naive Bayes classifier ClassificationNaiveBayes model

Output Arguments

label — Predicted class labels categorical vector | character array | logical vector | numeric vector | cell array of character vectors

Posterior — Class posterior probability numeric matrix

Cost — Expected misclassification costs numeric matrix

More About

Misclassification Cost

Posterior Probability

Prior Probability

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

`Mdl` — Full, trained naive Bayes classifier
`ClassificationNaiveBayes` model

`label` — Predicted class labels
categorical vector | character array | logical vector | numeric vector | cell array of character vectors

`Posterior` — Class posterior probability
numeric matrix

`Cost` — Expected misclassification costs
numeric matrix