Feature Selection and Feature Transformation Using Classification Learner App

Investigate Features in the Scatter Plot

In Classification Learner, try to identify predictors that separate classes well by plotting different pairs of predictors on the scatter plot. The plot can help you investigate features to include or exclude. You can visualize training data and misclassified points on the scatter plot.

Before you train a classifier, the scatter plot shows the data. If you have trained a classifier, the scatter plot shows model prediction results. Switch to plotting only the data by selecting Data in the Plot controls.

Choose features to plot using the X and Y lists under Predictors.
Look for predictors that separate classes well. For example, plotting the fisheriris data, you can see that sepal length and sepal width separate one of the classes well (setosa). You need to plot other predictors to see if you can separate the other two classes.
Show or hide specific classes using the check boxes under Show.
Change the stacking order of the plotted classes by selecting a class under Classes and then clicking Move to Front.
Investigate finer details by zooming in and out and panning across the plot. To enable zooming or panning, hover the mouse over the scatter plot and click the corresponding button on the toolbar that appears above the top right of the plot.
If you identify predictors that are not useful for separating out classes, then try using Feature Selection to remove them and train classifiers including only the most useful predictors.

After you train a classifier, the scatter plot shows model prediction results. You can show or hide correct or incorrect results and visualize the results by class. See Plot Classifier Results.

You can export the scatter plots you create in the app to figures. See Export Plots in Classification Learner App.

Select Features to Include

In Classification Learner, you can specify different features (or predictors) to include in the model. See if you can improve models by removing features with low predictive power. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors.

On the Classification Learner tab, in the Features section, click Feature Selection.
In the Feature Selection tearaway window, clear the check boxes for the predictors you want to exclude.
Tip
You can close the Feature Selection tearaway window, or move it. Your choices in the tearaway remain.
Click Train to train a new model using the new predictor options.
Observe the new model in the History list. The Current model pane displays how many predictors are excluded.
To check which predictors are included in a trained model, click the model in the History list and observe the check boxes in the Feature Selection dialog box.
You can try to improve the model by including different features in the model.

For an example using feature selection, see Train Decision Trees Using Classification Learner App.

Transform Features with PCA in Classification Learner

Use principal component analysis (PCA) to reduce the dimensionality of the predictor space. Reducing the dimensionality can create classification models in Classification Learner that help prevent overfitting. PCA linearly transforms predictors in order to remove redundant dimensions, and generates a new set of variables called principal components.

On the Classification Learner tab, in the Features section, select PCA.
In the Advanced PCA Options tearaway window, select the Enable PCA check box.
You can close the PCA tearaway window, or move it. Your choices in the tearaway remain.
When you next click Train, the pca function transforms your selected features before training the classifier.
By default, PCA keeps only the components that explain 95% of the variance. In the PCA tearaway window, you can change the percentage of variance to explain in the Explained variance box. A higher value risks overfitting, while a lower value risks removing useful dimensions.
If you want to manually limit the number of PCA components, in the Component reduction criterion list, select Specify number of components. Edit the number in the Number of numeric components box. The number of components cannot be larger than the number of numeric predictors. PCA is not applied to categorical predictors.

Check PCA options for trained models in the Current model pane information. Check the explained variance percentages to decide whether to change the number of components. For example:

PCA is keeping enough components to explain 95% variance. 
After training, 2 components were kept. 
Explained variance per component (in order): 92.5%, 5.3%, 1.7%, 0.5%

To learn more about how Classification Learner applies PCA to your data, generate code for your trained classifier. For more information on PCA, see the pca function.

Investigate Features in the Parallel Coordinates Plot

To investigate features to include or exclude, use the parallel coordinates plot. You can visualize high-dimensional data on a single plot to see 2-D patterns. The plot can help you understand relationships between features and identify useful predictors for separating classes. You can visualize training data and misclassified points on the parallel coordinates plot. When you plot classifier results, misclassified points have dashed lines.

On the Classification Learner tab, in the Plots section, click Parallel Coordinates Plot.
On the plot, drag the X tick labels to reorder the predictors. Changing the order can help you identify predictors that separate classes well.
To specify which predictors to plot, use the Predictors check boxes. A good practice is to plot a few predictors at a time. If your data has many predictors, the plot shows the first 10 predictors by default. Deselecting all predictors reselects the first 10 predictors.
If the predictors have significantly different scales, scale the data for easier visualization. Try different options in the Scaling list:
- Range plots along coordinate rulers from the minimum to maximum for each predictor independently.
- Standardized plots the mean of each predictor at zero and scales the predictors by their standard deviations.
If you identify predictors that are not useful for separating out classes, use Feature Selection to remove them and train classifiers including only the most useful predictors.

The plot of the fisheriris data shows the petal length and petal width features separate the classes best.

Parallel coordinates plot displaying classifier results for the Fisher iris data

You can export the parallel coordinates plots you create in the app to figures. See Export Plots in Classification Learner App.

Documentation