resubPredict

Class: ClassificationNaiveBayes

Predict resubstitution labels of naive Bayes classifier

Description

example

label = resubPredict(Mdl) returns a vector of predicted class labels (label) for the trained naive Bayes classifier Mdl using the predictor data Mdl.X.

example

[label,Posterior,Cost] = predict(Mdl) additionally returns posterior probabilities (Posterior) and predicted (expected) misclassification costs (Cost) corresponding to the observations (rows) in Mdl.X.

Input Arguments

expand all

A fully trained naive Bayes classifier, specified as a ClassificationNaiveBayes model trained by fitcnb.

Output Arguments

expand all

Predicted class labels, returned as a categorical vector, character array, logical or numeric vector, or cell array of character vectors.

label:

  • Is the same data type as the observed class labels (Y) that trained Mdl. (The software treats string arrays as cell arrays of character vectors.)

  • Has length equal to the number of rows of X.

  • Is the class yielding the lowest expected misclassification cost (Cost).

Class posterior probabilities, returned as a numeric matrix. Posterior has rows equal to the number of rows of Mdl.X and columns equal to the number of distinct classes in the training data (size(Mdl.ClassNames,1)).

Posterior(j,k) is the predicted posterior probability of class k (i.e., in class Mdl.ClassNames(k)) given the observation in row j of Mdl.X.

Data Types: double

Expected misclassification costs, returned as a numeric matrix. Cost has rows equal to the number of rows of Mdl.X and columns equal to the number of distinct classes in the training data (size(Mdl.ClassNames,1)).

Cost(j,k) is the expected misclassification cost of the observation in row j of Mdl.X being predicted into class k (i.e., in class Mdl.ClassNames(k)).

Examples

expand all

Load Fisher's iris data set.

load fisheriris
X = meas;    % Predictors
Y = species; % Response

Train a naive Bayes classifier. It is good practice to specify the class order. Assume that each predictor is conditionally, normally distributed given its label.

Mdl = fitcnb(X,Y,...
    'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationNaiveBayes classifier.

Predict the training sample labels. Display the results for the 10 observations.

label = resubPredict(Mdl);
rng(1); % For reproducibility
idx = randsample(size(X,1),10);
table(Y(idx),label(idx),'VariableNames',...
    {'TrueLabel','PredictedLabel'})
ans=10×2 table
      TrueLabel       PredictedLabel
    ______________    ______________

    {'setosa'    }    {'setosa'    }
    {'versicolor'}    {'versicolor'}
    {'virginica' }    {'virginica' }
    {'setosa'    }    {'setosa'    }
    {'versicolor'}    {'versicolor'}
    {'setosa'    }    {'setosa'    }
    {'versicolor'}    {'versicolor'}
    {'versicolor'}    {'versicolor'}
    {'setosa'    }    {'setosa'    }
    {'setosa'    }    {'setosa'    }

Load Fisher's iris data set.

load fisheriris
X = meas;    % Predictors
Y = species; % Response

Train a naive Bayes classifier. It is good practice to specify the class order. Assume that each predictor is conditionally, normally distributed given its label.

Mdl = fitcnb(X,Y,...
    'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationNaiveBayes classifier.

Estimate posterior probabilities and expected misclassification costs for the training data. Display the results for 10 observations.

[label,Posterior,MisclassCost] = resubPredict(Mdl);
rng(1); % For reproducibility
idx = randsample(size(X,1),10);
Mdl.ClassNames
ans = 3x1 cell
    {'setosa'    }
    {'versicolor'}
    {'virginica' }

table(Y(idx),label(idx),Posterior(idx,:),'VariableNames',...
    {'TrueLabel','PredictedLabel','PosteriorProbability'})
ans=10×3 table
      TrueLabel       PredictedLabel              PosteriorProbability           
    ______________    ______________    _________________________________________

    {'setosa'    }    {'setosa'    }              1     3.8821e-16     5.5878e-24
    {'versicolor'}    {'versicolor'}     1.2516e-54              1     4.5001e-06
    {'virginica' }    {'virginica' }    5.5646e-188     0.00058232        0.99942
    {'setosa'    }    {'setosa'    }              1     4.5352e-20     3.1301e-27
    {'versicolor'}    {'versicolor'}     5.0002e-69        0.99989     0.00010716
    {'setosa'    }    {'setosa'    }              1     2.9813e-18     2.1524e-25
    {'versicolor'}    {'versicolor'}     4.6313e-60        0.99999     7.5413e-06
    {'versicolor'}    {'versicolor'}    7.9205e-100        0.94293       0.057072
    {'setosa'    }    {'setosa'    }              1      1.799e-19     6.0606e-27
    {'setosa'    }    {'setosa'    }              1     1.5426e-17     1.2744e-24

MisclassCost(idx,:)
ans = 10×3

    0.0000    1.0000    1.0000
    1.0000    0.0000    1.0000
    1.0000    0.9994    0.0006
    0.0000    1.0000    1.0000
    1.0000    0.0001    0.9999
    0.0000    1.0000    1.0000
    1.0000    0.0000    1.0000
    1.0000    0.0571    0.9429
    0.0000    1.0000    1.0000
    0.0000    1.0000    1.0000

The order of the columns of Posterior and MisclassCost corresponds to the order of the classes in Mdl.ClassNames.

More About

expand all

References

[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.