This example shows how to create and compare different naive Bayes classifiers using the Classification Learner app, and export trained models to the workspace to make predictions for new data.
Naive Bayes classifiers leverage Bayes theorem and make the assumption that predictors are independent of one another within each class. However, the classifiers appear to work well even when the independence assumption is not valid. You can use naive Bayes with two or more classes in Classification Learner. The app allows you to train a Gaussian naive Bayes model or a kernel naive Bayes model individually or simultaneously.
This table lists the available naive Bayes models in Classification Learner and the probability distributions used by each model to fit predictors.
Model | Numerical Predictor | Categorical Predictor |
---|---|---|
Gaussian naive Bayes | Gaussian distribution (or normal distribution) | multivariate multinomial distribution |
Kernel naive Bayes | Kernel distribution You can specify the kernel type and support. Classification Learner automatically determines the kernel width using the underlying fitcnb
function. | multivariate multinomial distribution |
This example uses Fisher's iris data set, which contains measurements of flowers (petal length, petal width, sepal length, and sepal width) for specimens from three species. Train naive Bayes classifiers to predict the species based on the predictor measurements.
In the MATLAB® Command Window, load the Fisher iris data set and create a table of measurement predictors (or features) using variables from the data set.
fishertable = readtable('fisheriris.csv');
Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.
On the Classification Learner tab, in the File section, select New Session > From Workspace.
In the New Session dialog box, select the table fishertable
from the Data Set Variable list (if necessary).
As shown in the dialog box, the app selects the response and predictor variables based on their data type. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections.
To accept the default validation scheme and continue, click Start Session. The default validation option is cross-validation, to protect against overfitting.
Classification Learner creates a scatter plot of the data.
Use the scatter plot to investigate which variables are useful for predicting the response. Select different options on the X and Y lists under Predictors to visualize the distribution of species and measurements. Observe which variables separate the species colors most clearly.
The setosa
species (blue points) is easy to separate from
the other two species with all four predictors. The
versicolor
and virginica
species are
much closer together in all predictor measurements and overlap, especially when
you plot sepal length and width. setosa
is easier to predict
than the other two species.
Create a naive Bayes model. On the Classification Learner tab, in the Model Type section, click the arrow to open the gallery. In the Naive Bayes Classifiers group, click Gaussian Naive Bayes. Note that Classification Learner disables the Advanced button in the Model Type section, because this type of model has no advanced settings.
In the Training section, click Train.
Tip
If you have Parallel Computing Toolbox™ the Opening Pool dialog box opens the first time you click Train (or when you click Train again after an extended period of time). The dialog box remains open while the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can train multiple classifiers at once and continue working.
The app creates a Gaussian naive Bayes model, and plots the results.
The app display the Gaussian Naive Bayes model in the History list. Check the model validation score in the Accuracy box. The score shows that the model performs well.
For the Gaussian Naive Bayes model, by default, the app models the distribution of numerical predictors using the Gaussian distribution, and models the distribution of categorical predictors using the multivariate multinomial distribution (MVMN).
Note
Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.
Examine the scatter plot. An X indicates misclassified points. The blue points
(setosa
species) are all correctly classified, but the
other two species have misclassified points. Under Plot,
switch between the Data and Model
predictions options. Observe the color of the incorrect (X)
points. Or, to view only the incorrect points, clear the
Correct check box.
Train a kernel naive Bayes model for comparison. On the Classification Learner tab, in the Model Type section, click Kernel Naive Bayes. Note that Classification Learner enables the Advanced button, because this type of model has advanced settings.
The app displays a draft kernel naive Bayes model in the History list.
In the Model Type section, click Advanced to change settings in the Advanced Naive Bayes Options
menu. Select Triangle
from the Kernel
Type list, and select Positive
from
the Support list.
Note
The settings in the Advanced Naive Bayes Options menu are available for continuous data only. Pointing to Kernel Type displays the tooltip "Specify Kernel smoothing function for continuous variables," and pointing to Support displays the tooltip "Specify Kernel smoothing density support for continuous variables."
In the Training section, click Train to train the new model.
The History list now includes the new kernel naive Bayes model. Its model validation score is better than the score for the Gaussian naive Bayes model. The app highlights the Accuracy score of the best model by outlining it in a box.
In the History list, click each model to view and compare the results.
Train a Gaussian naive Bayes model and a kernel naive Bayes model simultaneously. On the Classification Learner tab, in the Model Type section, click All Naive Bayes. Classification Learner disables the Advanced button. In the Training section, click Train.
The app trains one of each naive Bayes model type and highlights the Accuracy score of the best model or models.
In the History list, click a model to view the results. Examine the scatter plot for the trained model and try plotting different predictors. Misclassified points appear as an X.
To inspect the accuracy of the predictions in each class, on the Classification Learner tab, in the Plots section, click Confusion Matrix. The app displays a matrix of true class and predicted class results.
Note
Validation introduces some randomness into the results. Your confusion matrix results can vary from the results shown in this example.
In the History list, click the other models and compare their results.
In the History list, click the model with the highest Accuracy score. To improve the model, try modifying its features. For example, see if you can improve the model by removing features with low predictive power.
On the Classification Learner tab, in the Features section, click Feature Selection.
In the Feature Selection menu, clear the check boxes for PetalLength and PetalWidth to exclude them from the predictors. A new draft model (model 4) appears in the History list with the new settings (2/4 features), based on the kernel naive Bayes model (model 3.2 in the History list).
In the Training section, click Train to train a new kernel naive Bayes model using the new predictor options.
The History list now includes model 4. It is also a kernel naive Bayes model, trained using only 2 of 4 predictors.
To determine which predictors are included, click a model in the History list, then click Feature Selection in the Features section and note which check boxes are selected. The model with only sepal measurements (model 4) has a much lower Accuracy score than the models containing all predictors.
Train another kernel naive Bayes model including only the petal measurements. Change the selections in the Feature Selection menu and click Train.
The model trained using only petal measurements (model 5) performs comparably to the model containing all predictors. The models predict no better using all the measurements compared to only the petal measurements. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors.
To investigate features to include or exclude, use the parallel coordinates plot. On the Classification Learner tab, in the Plots section, click Parallel Coordinates Plot.
In the History list, click the model with the highest Accuracy score. To improve the model further, try changing naive Bayes settings (if available). On the Classification Learner tab, in the Model Type section, click Advanced. Recall that the Advanced button is enabled only for some models. Change a setting, then train the new model by clicking Train.
Export the trained model to the workspace. On the Classification Learner tab, in the Export section, select Export Model > Export Model. See Export Classification Model to Predict New Data.
Examine the code for training this classifier. In the Export section, click Generate Function.
Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner.
To try all the nonoptimizable classifier model presets available for your data set:
Click the arrow on the Model Type section to open the gallery of classifiers.
In the Get Started group, click All, then click Train in the Training section.
For information about other classifier types, see Train Classification Models in Classification Learner App.