Superclasses: CompactClassificationNaiveBayes
Naive Bayes classification
ClassificationNaiveBayes
is a naive Bayes
classifier for multiclass learning. Use fitcnb
and the training data to train a
ClassificationNaiveBayes
classifier.
Trained ClassificationNaiveBayes
classifiers store the training data,
parameter values, data distribution, and prior probabilities. You can use these classifiers to:
Estimate resubstitution predictions. For details, see resubPredict
.
Predict labels or posterior probabilities for new data. For details, see predict
.
Create a ClassificationNaiveBayes
object by using fitcnb
.
compact | Compact naive Bayes classifier |
crossval | Cross-validated naive Bayes classifier |
resubEdge | Classification edge for naive Bayes classifiers by resubstitution |
resubLoss | Classification loss for naive Bayes classifiers by resubstitution |
resubMargin | Classification margins for naive Bayes classifiers by resubstitution |
resubPredict | Predict resubstitution labels of naive Bayes classifier |
edge | Classification edge for naive Bayes classifiers |
logP | Log unconditional probability density for naive Bayes classifier |
loss | Classification error for naive Bayes classifier |
margin | Classification margins for naive Bayes classifiers |
predict | Predict labels using naive Bayes classification model |
Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).
If you specify 'DistributionNames','mn'
when training
Mdl
using fitcnb
, then the software fits a multinomial distribution using the
bag-of-tokens
model. The software stores the probability that token
j
appears in class k
in
the property
DistributionParameters{
.
Using additive smoothing [2], the estimated probability isk
,j
}
where:
which is the weighted number of occurrences of token j in class k.
nk is the number of observations in class k.
is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class.
which is the total weighted number of occurrences of all tokens in class k.
If you specify 'DistributionNames','mvmn'
when training
Mdl
using fitcnb
, then:
For each predictor, the software collects a list of the unique levels,
stores the sorted list in CategoricalLevels
, and considers each level a bin. Each
predictor/class combination is a separate, independent multinomial
random variable.
For predictor j
in class
k, the software counts instances of each
categorical level using the list stored in
CategoricalLevels{
.j
}
The software stores the probability that predictor
j
, in class k
,
has level L in the property
DistributionParameters{
,
for all levels in
k
,j
}CategoricalLevels{
.
Using additive smoothing [2], the estimated probability isj
}
where:
which is the weighted number of observations for which predictor j equals L in class k.
nk is the number of observations in class k.
if xij = L, 0 otherwise.
is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class.
mj is the number of distinct levels in predictor j.
mk is the weighted number of observations in class k.
[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.
[2] Manning, C. D., P. Raghavan, and M. Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.