resubEdge

Class: ClassificationNaiveBayes

Classification edge for naive Bayes classifiers by resubstitution

Description

example

e = resubEdge(Mdl) returns the resubstitution classification edge (e) for the naive Bayes classifier Mdl using the training data stored in Mdl.X and corresponding class labels stored in Mdl.Y.

Input Arguments

expand all

A fully trained naive Bayes classifier, specified as a ClassificationNaiveBayes model trained by fitcnb.

Output Arguments

expand all

Classification edge, returned as a scalar. If you passed in weights when training the classifier, then e is the weighted classification edge. The software normalizes the weights so that they sum to the prior probability of their respective class.

Examples

expand all

Load Fisher's iris data set.

load fisheriris
X = meas;    % Predictors
Y = species; % Response
rng(1);

Train a naive Bayes classifier. It is good practice to specify the class order. Assume that each predictor is conditionally, normally distributed given its label.

Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a trained ClassificationNaiveBayes classifier.

Estimate the resubstitution edge.

e = resubEdge(Mdl)
e = 0.8944

The mean of the training sample margins is approximately 0.9, which indicates that the classifier classifies in-sample observations with high confidence.

The classifier edge measures the average of the classifier margins. One way to perform feature selection is to compare training sample edges from multiple models. Based solely on this criterion, the classifier with the highest edge is the best classifier.

Load Fisher's iris data set. Define two data sets:

  • fullX contains all predictors.

  • partX contains the last two predictors.

load fisheriris
X = meas;    % Predictors
Y = species; % Response
fullX = X;
partX = X(:,3:4);

Train naive Bayes classifiers for each predictor set.

FullMdl = fitcnb(fullX,Y);
PartMdl = fitcnb(partX,Y);

Estimate the training sample edge for each classifier.

fullEdge = resubEdge(FullMdl)
fullEdge = 0.8944
partEdge = resubEdge(PartMdl)
partEdge = 0.9169

The edge for the classifier trained on predictors 3 and 4 is greater, suggesting that the classifier trained using only those predictors has a better in-sample fit.

More About

expand all