Resubstitution classification loss for naive Bayes classifier
returns the Classification Loss by resubstitution (L
= resubLoss(Mdl
)L
) or
the in-sample classification loss, for the naive Bayes classifier
Mdl
using the training data stored in
Mdl.X
and the corresponding class labels stored in
Mdl.Y
.
The classification loss (L
) is a generalization or
resubstitution quality measure. Its interpretation depends on the loss function and
weighting scheme; in general, better classifiers yield smaller classification loss
values.
Determine the in-sample classification error (resubstitution loss) of a naive Bayes classifier. In general, a smaller loss indicates a better classifier.
Load the fisheriris
data set. Create X
as a numeric matrix that contains four petal measurements for 150 irises. Create Y
as a cell array of character vectors that contains the corresponding iris species.
load fisheriris
X = meas;
Y = species;
Train a naive Bayes classifier using the predictors X
and class labels Y
. A recommended practice is to specify the class names. fitcnb
assumes that each predictor is conditionally and normally distributed.
Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'})
Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3x4 cell} Properties, Methods
Mdl
is a trained ClassificationNaiveBayes
classifier.
Estimate the in-sample classification error.
L = resubLoss(Mdl)
L = 0.0400
The naive Bayes classifier misclassifies 4% of the training observations.
Load the fisheriris
data set. Create X
as a numeric matrix that contains four petal measurements for 150 irises. Create Y
as a cell array of character vectors that contains the corresponding iris species.
load fisheriris
X = meas;
Y = species;
Train a naive Bayes classifier using the predictors X
and class labels Y
. A recommended practice is to specify the class names. fitcnb
assumes that each predictor is conditionally and normally distributed.
Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'});
Mdl
is a trained ClassificationNaiveBayes
classifier.
Estimate the logit resubstitution loss.
L = resubLoss(Mdl,'LossFun','logit')
L = 0.3310
The average in-sample logit loss is approximately 0.33.
Mdl
— Full, trained naive Bayes classifierClassificationNaiveBayes
modelFull, trained naive Bayes classifier, specified as a ClassificationNaiveBayes
model trained by fitcnb
.
LossFun
— Loss function'classiferror'
(default) | 'binodeviance'
| 'exponential'
| 'hinge'
| 'logit'
| 'mincost'
| 'quadratic'
| function handleLoss function, specified as a built-in loss function name or function handle.
The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar.
Value | Description |
---|---|
'binodeviance' | Binomial deviance |
'classiferror' | Classification error |
'exponential' | Exponential |
'hinge' | Hinge |
'logit' | Logistic |
'mincost' | Minimal expected misclassification cost (for classification scores that are posterior probabilities) |
'quadratic' | Quadratic |
'mincost'
is appropriate for
classification scores that are posterior probabilities. Naive
Bayes models return posterior probabilities as classification
scores by default (see predict
).
Specify your own function using function handle notation.
Suppose that n
is the number of
observations in X
and K
is the number of distinct classes
(numel(Mdl.ClassNames)
, where
Mdl
is the input model). Your function
must have this signature
lossvalue = lossfun
(C,S,W,Cost)
The output argument lossvalue
is a scalar.
You specify the function name
(lossfun
).
C
is an
n
-by-K
logical matrix with rows indicating the class to
which the corresponding observation belongs. The
column order corresponds to the class order in
Mdl.ClassNames
.
Create C
by setting
C(p,q) = 1
if observation
p
is in class
q
, for each row. Set all other
elements of row p
to
0
.
S
is an
n
-by-K
numeric matrix of classification scores. The column
order corresponds to the class order in
Mdl.ClassNames
.
S
is a matrix of classification
scores, similar to the output of predict
.
W
is an
n
-by-1 numeric vector of
observation weights. If you pass
W
, the software normalizes the
weights to sum to 1
.
Cost
is a
K
-by-K
numeric matrix of misclassification costs. For
example, Cost = ones(K) - eye(K)
specifies a cost of 0
for correct
classification and 1
for
misclassification.
Specify your function using
'LossFun',@
.lossfun
For more details on loss functions, see Classification Loss.
Data Types: char
| string
| function_handle
Classification loss functions measure the predictive inaccuracy of classification models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model.
Consider the following scenario.
L is the weighted average classification loss.
n is the sample size.
For binary classification:
yj is the observed class label. The software codes it as –1 or 1, indicating the negative or positive class, respectively.
f(Xj) is the raw classification score for observation (row) j of the predictor data X.
mj = yjf(Xj) is the classification score for classifying observation j into the class corresponding to yj. Positive values of mj indicate correct classification and do not contribute much to the average loss. Negative values of mj indicate incorrect classification and contribute significantly to the average loss.
For algorithms that support multiclass classification (that is, K ≥ 3):
yj*
is a vector of K – 1 zeros, with 1 in the
position corresponding to the true, observed class
yj. For example,
if the true class of the second observation is the third class and
K = 4, then
y2*
= [0 0 1 0]′. The order of the classes corresponds to the order in
the ClassNames
property of the input
model.
f(Xj)
is the length K vector of class scores for
observation j of the predictor data
X. The order of the scores corresponds to the
order of the classes in the ClassNames
property
of the input model.
mj = yj*′f(Xj). Therefore, mj is the scalar classification score that the model predicts for the true, observed class.
The weight for observation j is wj. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so they sum to 1. Therefore,
Given this scenario, the following table describes the supported loss
functions that you can specify by using the 'LossFun'
name-value pair
argument.
Loss Function | Value of LossFun | Equation |
---|---|---|
Binomial deviance | 'binodeviance' | |
Exponential loss | 'exponential' | |
Classification error | 'classiferror' | The classification error is the weighted fraction of misclassified observations where is the class label corresponding to the class with the maximal posterior probability. I{x} is the indicator function. |
Hinge loss | 'hinge' | |
Logit loss | 'logit' | |
Minimal cost | 'mincost' | The software computes the weighted minimal cost using this procedure for observations j = 1,...,n.
The weighted, average, minimum cost loss is |
Quadratic loss | 'quadratic' |
This figure compares the loss functions (except 'mincost'
) for one
observation over m. Some functions are normalized to pass through [0,1].
The posterior probability is the probability that an observation belongs in a particular class, given the data.
For naive Bayes, the posterior probability that a classification is k for a given observation (x1,...,xP) is
where:
is the conditional
joint density of the predictors given they are in class k. Mdl.DistributionNames
stores
the distribution names of the predictors.
π(Y = k)
is the class prior probability distribution. Mdl.Prior
stores
the prior distribution.
is the joint density of the predictors. The classes are discrete, so
The prior probability of a class is the assumed relative frequency with which observations from that class occur in a population.
ClassificationNaiveBayes
| CompactClassificationNaiveBayes
| fitcnb
| loss
| predict
| resubPredict
You have a modified version of this example. Do you want to open this example with your edits?