CompactClassificationNaiveBayes

Package: classreg.learning.classif

Compact naive Bayes classifier

Description

CompactClassificationNaiveBayes is a compact naive Bayes classifier.

The compact classifier does not include the data used for training the naive Bayes classifier. Therefore, you cannot perform tasks, such as cross validation, using the compact classifier.

Use a compact naive Bayes classifier to label new data (i.e., predicting the labels of new data) more efficiently.

Construction

CMdl = compact(Mdl) returns a compact naive Bayes classifier (CMdl) from a full, trained naive Bayes classifier (Mdl).

Input Arguments

expand all

`Mdl` — Fully trained naive Bayes classifier
`ClassificationNaiveBayes` model

A fully trained naive Bayes classifier, specified as a ClassificationNaiveBayes model trained by fitcnb.

Properties

expand all

`CategoricalPredictors` — Indices of categorical predictors
vector of positive integers

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

`CategoricalLevels` — Multivariate multinomial levels
cell vector of numeric vectors

Multivariate multinomial levels, specified as a cell vector of numeric vectors. CategoricalLevels has length equal to the number of predictors (size(X,2)).

The cells of CategoricalLevels correspond to predictors that you specified as 'mvmn' (i.e., having a multivariate multinomial distribution) during training. Cells that do not correspond to a multivariate multinomial distribution are empty ([]).

If predictor j is multivariate multinomial, then CategoricalLevels{j} is a list of all distinct values of predictor j in the sample (NaNs removed from unique(X(:,j))).

Data Types: cell

`ClassNames` — Distinct class names
categorical array | character array | logical vector | numeric vector | cell array of character vectors

Distinct class names, specified as a categorical or character array, logical or numeric vector, or cell vector of character vectors.

ClassNames is the same data type as Y, and has K elements or rows for character arrays. (The software treats string arrays as cell arrays of character vectors.)

`Cost` — Misclassification cost
square matrix

Misclassification cost, specified as a K-by-K square matrix.

The value of Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost correspond to the order of the classes in ClassNames.

The value of Cost does not influence training. You can reset Cost after training Mdl using dot notation, e.g., Mdl.Cost = [0 0.5; 1 0];.

Data Types: double | single

`DistributionNames` — Predictor distributions
`'normal'` (default) | `'kernel'` | `'mn'` | `'mvmn'` | cell array of character vectors

Predictor distributions fitcnb uses to model the predictors, specified as a character vector or cell array of character vectors.

This table summarizes the available distributions.

Value	Description
`'kernel'`	Kernel smoothing density estimate.
`'mn'`	Multinomial bag-of-tokens model. Indicates that all predictors have this distribution.
`'mvmn'`	Multivariate multinomial distribution.
`'normal'`	Normal (Gaussian) distribution.

If Distribution is a 1-by-P cell array of character vectors, then the software models feature j using the distribution in element j of the cell array.

Data Types: char | cell

`DistributionParameters` — Distribution parameter estimates
cell array

Distribution parameter estimates, specified as a cell array. DistributionParameters is a K-by-D cell array, where cell (k,d) contains the distribution parameter estimates for instances of predictor d in class k. The order of the rows corresponds to the order of the classes in the property ClassNames, and the order of the predictors corresponds to the order of the columns of X.

If class k has no observations for predictor j, then Distribution{k,j} is empty ([]).

The elements of DistributionParameters depends on the distributions of the predictors. This table describes the values in DistributionParameters{k,j}.

Distribution of Predictor j	Value
`kernel`	A `KernelDistribution` model. Display properties using cell indexing and dot notation. For example, to display the estimated bandwidth of the kernel density for predictor 2 in the third class, use `Mdl.DistributionParameters{3,2}.BandWidth`.
`mn`	A scalar representing the probability that token j appears in class k. For details, see Algorithms.
`mvmn`	A numeric vector containing the probabilities for each possible level of predictor j in class k. The software orders the probabilities by the sorted order of all unique levels of predictor j (stored in the property `CategoricalLevels`). For more details, see Algorithms.
`normal`	A 2-by-1 numeric vector. The first element is the sample mean and the second element is the sample standard deviation.

Data Types: cell

`ExpandedPredictorNames` — Expanded predictor names
cell array of character vectors

Expanded predictor names, stored as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

`Kernel` — Kernel smoother types
`'normal'` (default) | `'box'` | `'epanechnikov'` | `'triangle'` | cell array of character vectors

Kernel smoother types, specified as a character vector or cell array of character vectors. Kernel has length equal to the number of predictors (size(X,2)). Kernel{j} corresponds to predictor j, and contains a character vector describing the type of kernel smoother. This table describes the supported kernel smoother types. Let I{u} denote the indicator function.

Value	Kernel	Formula
`'box'`	Box (uniform)	$f (x) = 0.5 I {\| x \| \leq 1}$
`'epanechnikov'`	Epanechnikov	$f (x) = 0.75 (1 - x^{2}) I {\| x \| \leq 1}$
`'normal'`	Gaussian	$f (x) = \frac{1}{\sqrt{2 π}} \exp (- 0.5 x^{2})$
`'triangle'`	Triangular	$f (x) = (1 - \| x \|) I {\| x \| \leq 1}$

If a cell is empty ([]), then the software did not fit a kernel distribution to the corresponding predictor.

Data Types: char | cell

`PredictorNames` — Predictor names
cell array of character vectors

Predictor names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in X.

Data Types: cell

`Prior` — Class prior probabilities
numeric vector

Class prior probabilities, specified as a numeric row vector. Prior is a 1-by-K vector, and the order of its elements correspond to the elements of ClassNames.

fitcnb normalizes the prior probabilities you set using the name-value pair parameter 'Prior' so that sum(Prior) = 1.

The value of Prior does not change the best-fitting model. Therefore, you can reset Prior after training Mdl using dot notation, e.g., Mdl.Prior = [0.2 0.8];.

Data Types: double | single

`ResponseName` — Response name
character vector

Response name, specified as a character vector.

Data Types: char

`ScoreTransform` — Classification score transformation function
`'doublelogit'` | `'invlogit'` | `'ismax'` | `'logit'` | `'none'` | function handle | ...

Classification score transformation function, specified as a character vector or function handle.

To change the score transformation function to e.g., function, use dot notation.

For a built-in function, enter this code and replace function with a value in the table.

Mdl.ScoreTransform = 'function';

Value	Description
`'doublelogit'`	1/(1 + e^–2x)
`'invlogit'`	log(x / (1 – x))
`'ismax'`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`'logit'`	1/(1 + e^–x)
`'none'` or `'identity'`	x (no transformation)
`'sign'`	–1 for x < 0 0 for x = 0 1 for x > 0
`'symmetric'`	2x – 1
`'symmetricismax'`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`'symmetriclogit'`	2/(1 + e^–x) – 1

For a MATLAB^® function, or a function that you define, enter its function handle.
```
Mdl.ScoreTransform = @function;
```
function should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Data Types: char | function_handle

`Support` — Kernel smoother density support
cell vector

Kernel smoother density support, specified as a cell vector. Support has length equal to the number of predictors (size(X,2)). The cells represent the regions to apply the kernel density.

This table describes the supported options.

Value	Description
1-by-2 numeric row vector	For example, `[L,U]`, where `L` and `U` are the finite lower and upper bounds, respectively, for the density support.
`'positive'`	The density support is all positive real values.
`'unbounded'`	The density support is all real values.

If a cell is empty ([]), then the software did not fit a kernel distribution to the corresponding predictor.

`Width` — Kernel smoother window width
numeric matrix

Kernel smoother window width, specified as a numeric matrix. Width is a K-by-P matrix, where K is the number of classes in the data, and P is the number of predictors (size(X,2)).

Width(k,j) is the kernel smoother window width for the kernel smoothing density of predictor j within class k. NaNs in column j indicate that the software did not fit predictor j using a kernel density.

Methods

edge	Classification edge for naive Bayes classifiers
logP	Log unconditional probability density for naive Bayes classifier
loss	Classification error for naive Bayes classifier
margin	Classification margins for naive Bayes classifiers
predict	Predict labels using naive Bayes classification model

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples

collapse all

Reduce the Size of Naive Bayes Classifiers

Open Live Script

Full naive Bayes classifiers (i.e., ClassificationNaiveBayes models) hold the training data. For efficiency, you might not want to predict new labels using a large classifier. This example shows how to reduce the size of a full naive Bayes classifier.

Load the ionosphere data set.

load ionosphere
X = X(:,3:end); % Remove two predictors for stability

Train a naive Bayes classifier. Assume that each predictor is conditionally, normally distributed given its label. It is good practice to specify the order of the labels.

Mdl = fitcnb(X,Y,'ClassNames',{'b','g'})

Mdl = 
  ClassificationNaiveBayes
              ResponseName: 'Y'
     CategoricalPredictors: []
                ClassNames: {'b'  'g'}
            ScoreTransform: 'none'
           NumObservations: 351
         DistributionNames: {1x32 cell}
    DistributionParameters: {2x32 cell}


  Properties, Methods

Mdl is a ClassificationNaiveBayes model.

Reduce the size of the naive Bayes classifier.

CMdl = compact(Mdl)

CMdl = 
  classreg.learning.classif.CompactClassificationNaiveBayes
              ResponseName: 'Y'
     CategoricalPredictors: []
                ClassNames: {'b'  'g'}
            ScoreTransform: 'none'
         DistributionNames: {1x32 cell}
    DistributionParameters: {2x32 cell}


  Properties, Methods

CMdl is a CompactClassificationNaiveBayes model.

Display how much memory each classifier uses.

whos('Mdl','CMdl')

  Name      Size             Bytes  Class                                                        Attributes

  CMdl      1x1              14892  classreg.learning.classif.CompactClassificationNaiveBayes              
  Mdl       1x1             111006  ClassificationNaiveBayes

The full naive Bayes classifier (Mdl) is much larger than the compact naive Bayes classifier (CMdl).

You can remove Mdl from the MATLAB® Workspace, and pass CMdl and new predictor values to predict to efficiently label new observations.

Train and Cross Validate Naive Bayes Classifiers

Open Live Script

Load the ionosphere data set.

load ionosphere
X = X(:,3:end); % Remove two predictors for stability

Train and cross validate a naive Bayes classifier. Assume that each predictor is conditionally, normally distributed given its label. It is good practice to specify the order of the classes.

rng(1);  % For reproducibility
CVMdl = fitcnb(X,Y,'ClassNames',{'b','g'},'CrossVal','on')

CVMdl = 
  classreg.learning.partition.ClassificationPartitionedModel
    CrossValidatedModel: 'NaiveBayes'
         PredictorNames: {1x32 cell}
           ResponseName: 'Y'
        NumObservations: 351
                  KFold: 10
              Partition: [1x1 cvpartition]
             ClassNames: {'b'  'g'}
         ScoreTransform: 'none'


  Properties, Methods

CVMdl is not a ClassificationNaiveBayes model, but a ClassificationPartitionedModel cross-validated, naive Bayes model. By default, the software implements 10-fold cross validation.

Alternatively, you can cross validate a trained ClassificationNaiveBayes model by passing it to crossval.

Inspect one of the trained folds using dot notation.

CVMdl.Trained{1}

ans = 
  classreg.learning.classif.CompactClassificationNaiveBayes
              ResponseName: 'Y'
     CategoricalPredictors: []
                ClassNames: {'b'  'g'}
            ScoreTransform: 'none'
         DistributionNames: {1x32 cell}
    DistributionParameters: {2x32 cell}


  Properties, Methods

Each fold is a CompactClassificationNaiveBayes model trained on 90% of the data.

Estimate the generalization error.

genError = kfoldLoss(CVMdl)

genError = 0.1795

On average, the generalization error is approximately 17%.

One way to attempt reducing an unsatisfactory generalization error is to specify different conditional distributions for the predictors, or tune the parameters of the conditional distributions.

More About

expand all

Bag-of-Tokens Model

In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in this observation. The number of categories (bins) in this multinomial model is the number of distinct tokens, that is, the number of predictors.

Naive Bayes

Naive Bayes is a classification algorithm that applies density estimation to the data.

The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Though the assumption is usually violated in practice, naive Bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1].

Naive Bayes classifiers assign observations to the most probable class (in other words, the maximum a posteriori decision rule). Explicitly, the algorithm:

Estimates the densities of the predictors within each class.
Models posterior probabilities according to Bayes rule. That is, for all k = 1,...,K,

$\hat{P} (Y = k | X_{1}, .., X_{P}) = \frac{π (Y = k) \prod_{j = 1}^{P} P (X_{j} | Y = k)}{\sum_{k = 1}^{K} π (Y = k) \prod_{j = 1}^{P} P (X_{j} | Y = k)},$
where:
- Y is the random variable corresponding to the class index of an observation.
- X₁,...,X_P are the random predictors of an observation.
- $π (Y = k)$ is the prior probability that a class index is k.
Classifies an observation by estimating the posterior probability for each class, and then assigns the observation to the class yielding the maximum posterior probability.

If the predictors compose a multinomial distribution, then the posterior probability $\hat{P} (Y = k | X_{1}, .., X_{P}) \propto π (Y = k) P_{m n} (X_{1}, ..., X_{P} | Y = k),$ where $P_{m n} (X_{1}, ..., X_{P} | Y = k)$ is the probability mass function of a multinomial distribution.

Algorithms

If you specify 'DistributionNames','mn' when training Mdl using fitcnb, then the software fits a multinomial distribution using the bag-of-tokens model. The software stores the probability that token j appears in class k in the property DistributionParameters{k,j}. Using additive smoothing [2], the estimated probability is

$P (token j | class k) = \frac{1 + c_{j | k}}{P + c_{k}},$
where:
- $c_{j | k} = n_{k} \frac{\sum_{i : y_{i} \in class k}^{} x_{i j} w_{i}^{}}{\sum_{i : y_{i} \in class k}^{} w_{i}};$ which is the weighted number of occurrences of token j in class k.
- n_k is the number of observations in class k.
- $w_{i}^{}$ is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class.
- $c_{k} = \sum_{j = 1}^{P} c_{j | k};$ which is the total weighted number of occurrences of all tokens in class k.
If you specify 'DistributionNames','mvmn' when training Mdl using fitcnb, then:
1. For each predictor, the software collects a list of the unique levels, stores the sorted list in CategoricalLevels, and considers each level a bin. Each predictor/class combination is a separate, independent multinomial random variable.
2. For predictor j in class k, the software counts instances of each categorical level using the list stored in CategoricalLevels{j}.
3. The software stores the probability that predictor j, in class k, has level L in the property DistributionParameters{k,j}, for all levels in CategoricalLevels{j}. Using additive smoothing [2], the estimated probability is
  
  $P (predictor j = L | class k) = \frac{1 + m_{j | k} (L)}{m_{j} + m_{k}},$
  where:
  - $m_{j | k} (L) = n_{k} \frac{\sum_{i : y_{i} \in class k}^{} I {x_{i j} = L} w_{i}^{}}{\sum_{i : y_{i} \in class k}^{} w_{i}^{}};$ which is the weighted number of observations for which predictor j equals L in class k.
  - n_k is the number of observations in class k.
  - $I {x_{i j} = L} = 1$ if x_ij = L, 0 otherwise.
  - $w_{i}^{}$ is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class.
  - m_j is the number of distinct levels in predictor j.
  - m_k is the weighted number of observations in class k.

References

[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.

[2] Manning, C. D., P. Raghavan, and M. Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The predict function supports code generation.
When you train a naive Bayes model by using fitcnb, the following restrictions apply.
- The class labels input argument value (Y) cannot be a categorical array.
- Code generation does not support categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the 'CategoricalPredictors' name-value pair argument.
- The value of the 'DistributionNames' name-value pair argument cannot contain 'mn' or 'mvmn'.
- The value of the 'ClassNames' name-value pair argument cannot be a categorical array.
- The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function.

For more information, see Introduction to Code Generation.

Documentation

CompactClassificationNaiveBayes

Description

Construction

Input Arguments

`Mdl` — Fully trained naive Bayes classifier
`ClassificationNaiveBayes` model

Properties

`CategoricalPredictors` — Indices of categorical predictors
vector of positive integers

`CategoricalLevels` — Multivariate multinomial levels
cell vector of numeric vectors

`ClassNames` — Distinct class names
categorical array | character array | logical vector | numeric vector | cell array of character vectors

`Cost` — Misclassification cost
square matrix

`DistributionNames` — Predictor distributions
`'normal'` (default) | `'kernel'` | `'mn'` | `'mvmn'` | cell array of character vectors

`DistributionParameters` — Distribution parameter estimates
cell array

`ExpandedPredictorNames` — Expanded predictor names
cell array of character vectors

`Kernel` — Kernel smoother types
`'normal'` (default) | `'box'` | `'epanechnikov'` | `'triangle'` | cell array of character vectors

`PredictorNames` — Predictor names
cell array of character vectors

`Prior` — Class prior probabilities
numeric vector

`ResponseName` — Response name
character vector

`ScoreTransform` — Classification score transformation function
`'doublelogit'` | `'invlogit'` | `'ismax'` | `'logit'` | `'none'` | function handle | ...

`Support` — Kernel smoother density support
cell vector

`Width` — Kernel smoother window width
numeric matrix

Methods

Copy Semantics

Examples

Reduce the Size of Naive Bayes Classifiers

Train and Cross Validate Naive Bayes Classifiers

More About

Bag-of-Tokens Model

Naive Bayes

Algorithms

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

Documentation

CompactClassificationNaiveBayes

Description

Construction

Input Arguments

Mdl — Fully trained naive Bayes classifier ClassificationNaiveBayes model

Properties

CategoricalPredictors — Indices of categorical predictors vector of positive integers

CategoricalLevels — Multivariate multinomial levels cell vector of numeric vectors

ClassNames — Distinct class names categorical array | character array | logical vector | numeric vector | cell array of character vectors

Cost — Misclassification cost square matrix

DistributionNames — Predictor distributions 'normal' (default) | 'kernel' | 'mn' | 'mvmn' | cell array of character vectors

DistributionParameters — Distribution parameter estimates cell array

ExpandedPredictorNames — Expanded predictor names cell array of character vectors

Kernel — Kernel smoother types 'normal' (default) | 'box' | 'epanechnikov' | 'triangle' | cell array of character vectors

PredictorNames — Predictor names cell array of character vectors

Prior — Class prior probabilities numeric vector

ResponseName — Response name character vector

ScoreTransform — Classification score transformation function 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ...

Support — Kernel smoother density support cell vector

Width — Kernel smoother window width numeric matrix

Methods

Copy Semantics

Examples

Reduce the Size of Naive Bayes Classifiers

Train and Cross Validate Naive Bayes Classifiers

More About

Bag-of-Tokens Model

Naive Bayes

Algorithms

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

`Mdl` — Fully trained naive Bayes classifier
`ClassificationNaiveBayes` model

`CategoricalPredictors` — Indices of categorical predictors
vector of positive integers

`CategoricalLevels` — Multivariate multinomial levels
cell vector of numeric vectors

`ClassNames` — Distinct class names
categorical array | character array | logical vector | numeric vector | cell array of character vectors

`Cost` — Misclassification cost
square matrix

`DistributionNames` — Predictor distributions
`'normal'` (default) | `'kernel'` | `'mn'` | `'mvmn'` | cell array of character vectors

`DistributionParameters` — Distribution parameter estimates
cell array

`ExpandedPredictorNames` — Expanded predictor names
cell array of character vectors

`Kernel` — Kernel smoother types
`'normal'` (default) | `'box'` | `'epanechnikov'` | `'triangle'` | cell array of character vectors

`PredictorNames` — Predictor names
cell array of character vectors

`Prior` — Class prior probabilities
numeric vector

`ResponseName` — Response name
character vector

`ScoreTransform` — Classification score transformation function
`'doublelogit'` | `'invlogit'` | `'ismax'` | `'logit'` | `'none'` | function handle | ...

`Support` — Kernel smoother density support
cell vector

`Width` — Kernel smoother window width
numeric matrix

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.