Compact naive Bayes classifier for multiclass classification
CompactClassificationNaiveBayes
is a compact version of the
naive Bayes classifier. The compact classifier does not include the data used for
training the naive Bayes classifier. Therefore, you cannot perform some tasks, such as
cross-validation, using the compact classifier. Use a compact naive Bayes classifier for
tasks such as predicting the labels of the data.
Create a CompactClassificationNaiveBayes
model from a full, trained
ClassificationNaiveBayes
classifier by
using compact
.
PredictorNames
— Predictor namesThis property is read-only.
Predictor names, specified as a cell array of character vectors. The order of the
elements in PredictorNames
corresponds to the order in which the
predictor names appear in the training data X
.
ExpandedPredictorNames
— Expanded predictor namesThis property is read-only.
Expanded predictor names, specified as a cell array of character vectors.
If the model uses dummy variable encoding for categorical variables, then
ExpandedPredictorNames
includes the names that describe the
expanded variables. Otherwise, ExpandedPredictorNames
is the same as
PredictorNames
.
CategoricalPredictors
— Categorical predictor indices[]
| vector of positive integersThis property is read-only.
Categorical predictor indices, specified as a vector of
positive integers. CategoricalPredictors
contains index values
corresponding to the columns of predictor data that contain categorical predictors. If
none of the predictors are categorical, then this property is empty
([]
).
Data Types: single
| double
CategoricalLevels
— Multivariate multinomial levelsThis property is read-only.
Multivariate multinomial levels, specified as a cell array. The length of
CategoricalLevels
is equal to the number of
predictors (size(X,2)
).
The cells of CategoricalLevels
correspond to predictors
that you specify as 'mvmn'
during training, that is, they
have a multivariate multinomial distribution. Cells that do not correspond
to a multivariate multinomial distribution are empty
([]
).
If predictor j is multivariate multinomial, then
CategoricalLevels{
j}
is a list of all distinct values of predictor j in the
sample. NaN
s are removed from
unique(X(:,j))
.
DistributionNames
— Predictor distributions'normal'
(default) | 'kernel'
| 'mn'
| 'mvmn'
| cell array of character vectorsThis property is read-only.
Predictor distributions, specified as a character vector or cell array of
character vectors. fitcnb
uses the predictor
distributions to model the predictors. This table lists the available
distributions.
Value | Description |
---|---|
'kernel' | Kernel smoothing density estimate |
'mn' | Multinomial distribution. If you specify
mn , then all features are
components of a multinomial distribution.
Therefore, you cannot include
'mn' as an element of a string
array or a cell array of character vectors. For
details, see Estimated Probability for Multinomial Distribution. |
'mvmn' | Multivariate multinomial distribution. For details, see Estimated Probability for Multivariate Multinomial Distribution. |
'normal' | Normal (Gaussian) distribution |
If DistributionNames
is a 1-by-P cell
array of character vectors, then fitcnb
models the feature
j using the distribution in element
j of the cell array.
Example: 'mn'
Example: {'kernel','normal','kernel'}
Data Types: char
| string
| cell
DistributionParameters
— Distribution parameter estimatesThis property is read-only.
Distribution parameter estimates, specified as a cell array.
DistributionParameters
is a
K-by-D cell array, where cell
(k,d) contains the distribution parameter
estimates for instances of predictor d in class k.
The order of the rows corresponds to the order of the classes in the property
ClassNames
, and the order of the predictors corresponds to the
order of the columns of X
.
If class k
has no observations for predictor
j
, then the
Distribution{
is empty (k
,j
}[]
).
The elements of DistributionParameters
depend on the distributions
of the predictors. This table describes the values in
DistributionParameters{
.k
,j
}
Distribution of Predictor j | Value of Cell Array for Predictor
j and Class k |
---|---|
kernel | A KernelDistribution model.
Display properties using cell indexing and dot notation. For
example, to display the estimated bandwidth of the kernel density
for predictor 2 in the third class, use
Mdl.DistributionParameters{3,2}.BandWidth . |
mn | A scalar representing the probability that token j appears in class k. For details, see Estimated Probability for Multinomial Distribution. |
mvmn | A numeric vector containing the probabilities for each possible
level of predictor j in class
k. The software orders the probabilities by
the sorted order of all unique levels of predictor
j (stored in the property
CategoricalLevels ). For more details, see
Estimated Probability for Multivariate Multinomial Distribution. |
normal | A 2-by-1 numeric vector. The first element is the sample mean and the second element is the sample standard deviation. |
Kernel
— Kernel smoother type'normal'
(default) | 'box'
| cell array | ...This property is read-only.
Kernel smoother type, specified as the name of a kernel or a cell array of kernel
names. The length of Kernel
is equal to the number of predictors
(size(X,2)
).
Kernel{
j}
corresponds to
predictor j and contains a character vector describing the type of
kernel smoother. If a cell is empty ([]
), then fitcnb
did not fit a kernel distribution to the corresponding
predictor.
This table describes the supported kernel smoother types. I{u} denotes the indicator function.
Value | Kernel | Formula |
---|---|---|
'box' | Box (uniform) |
|
'epanechnikov' | Epanechnikov |
|
'normal' | Gaussian |
|
'triangle' | Triangular |
|
Example: 'box'
Example: {'epanechnikov','normal'}
Data Types: char
| string
| cell
Support
— Kernel smoother density supportThis property is read-only.
Kernel smoother density support, specified as a cell array. The length of
Support
is equal to the number of predictors
(size(X,2)
). The cells represent the regions to which
fitcnb
applies the kernel density. If a cell is empty
([]
), then fitcnb
did not fit a kernel distribution to the corresponding
predictor.
This table describes the supported options.
Value | Description |
---|---|
1-by-2 numeric row vector | The density support applies to the specified bounds, for example
[L,U] , where L and
U are the finite lower and upper bounds,
respectively. |
'positive' | The density support applies to all positive real values. |
'unbounded' | The density support applies to all real values. |
Width
— Kernel smoother window widthThis property is read-only.
Kernel smoother window width, specified as a numeric matrix.
Width
is a
K-by-P matrix, where
K is the number of classes in the data, and
P is the number of predictors
(size(X,2)
).
Width(
is the kernel smoother window width for the kernel smoothing density of
predictor k
,j
)j
within class
k
. NaN
s in column
j
indicate that fitcnb
did not fit
predictor j
using a kernel density.
ClassNames
— Unique class namesThis property is read-only.
Unique class names used in the training model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors.
ClassNames
has the same data type as Y
, and
has K elements (or rows) for character arrays. (The software treats string arrays as cell arrays of character
vectors.)
Data Types: categorical
| char
| string
| logical
| double
| cell
ResponseName
— Response variable nameThis property is read-only.
Response variable name, specified as a character vector.
Data Types: char
| string
Prior
— Prior probabilitiesPrior probabilities, specified as a numeric vector. The order of the elements in
Prior
corresponds to the elements of
Mdl.ClassNames
.
fitcnb
normalizes the prior probabilities
you set using the 'Prior'
name-value pair argument, so that
sum(Prior)
= 1
.
The value of Prior
does not affect the best-fitting model.
Therefore, you can reset Prior
after training Mdl
using dot notation.
Example: Mdl.Prior = [0.2 0.8]
Data Types: double
| single
Cost
— Misclassification costMisclassification cost, specified as a numeric square matrix, where
Cost(i,j)
is the cost of classifying a point into class
j
if its true class is i
. The rows correspond
to the true class and the columns correspond to the predicted class. The order of the
rows and columns of Cost
corresponds to the order of the classes in
ClassNames
.
The misclassification cost matrix must have zeros on the diagonal.
The value of Cost
does not influence training. You can reset
Cost
after training Mdl
using dot
notation.
Example: Mdl.Cost = [0 0.5 ; 1 0]
Data Types: double
| single
ScoreTransform
— Classification score transformation'none'
(default) | 'doublelogit'
| 'invlogit'
| 'ismax'
| 'logit'
| function handle | ...Classification score transformation, specified as a character vector or function handle. This table summarizes the available character vectors.
Value | Description |
---|---|
'doublelogit' | 1/(1 + e–2x) |
'invlogit' | log(x / (1 – x)) |
'ismax' | Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0 |
'logit' | 1/(1 + e–x) |
'none' or 'identity' | x (no transformation) |
'sign' | –1 for x < 0 0 for x = 0 1 for x > 0 |
'symmetric' | 2x – 1 |
'symmetricismax' | Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1 |
'symmetriclogit' | 2/(1 + e–x) – 1 |
For a MATLAB® function or a function you define, use its function handle for the score transformation. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).
Example: Mdl.ScoreTransform = 'logit'
Data Types: char
| string
| function handle
edge | Classification edge for naive Bayes classifier |
logp | Log unconditional probability density for naive Bayes classifier |
loss | Classification loss for naive Bayes classifier |
margin | Classification margins for naive Bayes classifier |
partialDependence | Compute partial dependence |
plotPartialDependence | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |
predict | Classify observations using naive Bayes classifier |
Reduce the size of a full naive Bayes classifier by removing the training data. Full naive Bayes classifiers hold the training data. You can use a compact naive Bayes classifier to improve memory efficiency.
Load the ionosphere
data set. Remove the first two predictors for stability.
load ionosphere
X = X(:,3:end);
Train a naive Bayes classifier using the predictors X
and class labels Y
. A recommended practice is to specify the class names. fitcnb
assumes that each predictor is conditionally and normally distributed.
Mdl = fitcnb(X,Y,'ClassNames',{'b','g'})
Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' NumObservations: 351 DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell} Properties, Methods
Mdl
is a trained ClassificationNaiveBayes
classifier.
Reduce the size of the naive Bayes classifier.
CMdl = compact(Mdl)
CMdl = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell} Properties, Methods
CMdl
is a trained CompactClassificationNaiveBayes
classifier.
Display the amount of memory used by each classifier.
whos('Mdl','CMdl')
Name Size Bytes Class Attributes CMdl 1x1 15060 classreg.learning.classif.CompactClassificationNaiveBayes Mdl 1x1 111174 ClassificationNaiveBayes
The full naive Bayes classifier (Mdl
) is more than seven times larger than the compact naive Bayes classifier (CMdl
).
You can remove Mdl
from the MATLAB® Workspace, and pass CMdl
and new predictor values to predict
to efficiently label new observations.
Train and cross-validate a naive Bayes classifier. fitcnb
implements 10-fold cross-validation by default. Then, estimate the cross-validated classification error.
Load the ionosphere
data set. Remove the first two predictors for stability.
load ionosphere X = X(:,3:end); rng('default') % for reproducibility
Train and cross-validate a naive Bayes classifier using the predictors X
and class labels Y
. A recommended practice is to specify the class names. fitcnb
assumes that each predictor is conditionally and normally distributed.
CVMdl = fitcnb(X,Y,'ClassNames',{'b','g'},'CrossVal','on')
CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {1x32 cell} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods
CVMdl
is a ClassificationPartitionedModel
cross-validated, naive Bayes classifier. Alternatively, you can cross-validate a trained ClassificationNaiveBayes
model by passing it to crossval
.
Display the first training fold of CVMdl
using dot notation.
CVMdl.Trained{1}
ans = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell} Properties, Methods
Each fold is a CompactClassificationNaiveBayes
model trained on 90% of the data.
Full and compact naive Bayes models are not used for predicting on new data. Instead, use them to estimate the generalization error by passing CVMdl
to kfoldLoss
.
genError = kfoldLoss(CVMdl)
genError = 0.1852
On average, the generalization error is approximately 19%.
You can specify a different conditional distribution for the predictors, or tune the conditional distribution parameters to reduce the generalization error.
In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in the observation. The number of categories (bins) in the multinomial model is the number of distinct tokens (number of predictors).
Naive Bayes is a classification algorithm that applies density estimation to the data.
The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Although the assumption is usually violated in practice, naive Bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1].
Naive Bayes classifiers assign observations to the most probable class (in other words, the maximum a posteriori decision rule). Explicitly, the algorithm takes these steps:
Estimate the densities of the predictors within each class.
Model posterior probabilities according to Bayes rule. That is, for all k = 1,...,K,
where:
Y is the random variable corresponding to the class index of an observation.
X1,...,XP are the random predictors of an observation.
is the prior probability that a class index is k.
Classify an observation by estimating the posterior probability for each class, and then assign the observation to the class yielding the maximum posterior probability.
If the predictors compose a multinomial distribution, then the posterior probability where is the probability mass function of a multinomial distribution.
If you specify 'DistributionNames','mn'
when training
Mdl
using fitcnb
, then the software fits a multinomial distribution using the Bag-of-Tokens Model. The software stores the
probability that token j
appears in class
k
in the property
DistributionParameters{
.
With additive smoothing [2], the estimated
probability isk
,j
}
where:
which is the weighted number of occurrences of token j in class k.
nk is the number of observations in class k.
is the weight for observation i. The software normalizes weights within a class so that they sum to the prior probability for that class.
which is the total weighted number of occurrences of all tokens in class k.
If you specify 'DistributionNames','mvmn'
when training
Mdl
using fitcnb
, then the software takes these steps:
For each predictor, the software collects a list of the unique levels, stores
the sorted list in CategoricalLevels
, and considers each level a bin. Each
combination of predictor and class is a separate, independent multinomial random
variable.
For predictor j
in class k, the
software counts instances of each categorical level using the list stored in
CategoricalLevels{
.j
}
The software stores the probability that predictor
j
in class k
has level
L in the property
DistributionParameters{
,
for all levels in
k
,j
}CategoricalLevels{
. With
additive smoothing [2], the
estimated probability isj
}
where:
which is the weighted number of observations for which predictor j equals L in class k.
nk is the number of observations in class k.
if xij = L, and 0 otherwise.
is the weight for observation i. The software normalizes weights within a class so that they sum to the prior probability for that class.
mj is the number of distinct levels in predictor j.
mk is the weighted number of observations in class k.
[1] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. New York, NY: Springer, 2009. https://doi.org/10.1007/978-0-387-84858-7.
[2] Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.
Usage notes and limitations:
The predict
function supports code
generation.
When you train a naive Bayes model by using fitcnb
, the following restrictions apply.
The class labels input argument value (Y
) cannot be a
categorical array.
Code generation does
not support categorical predictors (logical
, categorical
,
char
, string
, or cell
). If you
supply training data in a table, the predictors must be numeric (double
or
single
). Also, you cannot use the
'CategoricalPredictors'
name-value pair argument.
The value of the 'DistributionNames'
name-value pair argument cannot contain
'mn'
or 'mvmn'
.
The value of the 'ClassNames'
name-value pair argument cannot be a
categorical array.
The value of the 'ScoreTransform'
name-value pair argument cannot be an
anonymous function.
For more information, see Introduction to Code Generation.
You have a modified version of this example. Do you want to open this example with your edits?