This example shows how to create and compare classifiers that use specified misclassification costs in the Classification Learner app. Specify the misclassification costs before training, and use the accuracy and total misclassification cost results to compare the trained models.
In the MATLAB® Command Window, read the sample file
CreditRating_Historical.dat
into a table. The predictor
data consists of financial ratios and industry sector information for a list of
corporate customers. The response variable consists of credit ratings assigned
by a rating agency. Combine all the A
ratings into one
rating. Do the same for the B
and C
ratings, so that the response variable has three distinct ratings. Among the
three ratings, A
is considered the best and
C
the worst.
creditrating = readtable('CreditRating_Historical.dat'); Rating = categorical(creditrating.Rating); Rating = mergecats(Rating,{'AAA','AA','A'},'A'); Rating = mergecats(Rating,{'BBB','BB','B'},'B'); Rating = mergecats(Rating,{'CCC','CC','C'},'C'); creditrating.Rating = Rating;
Assume these are the costs associated with misclassifying the credit ratings of customers.
Customer Predicted Rating | ||||
A | B | C | ||
Customer True Rating | A | $0 | $100 | $200 |
B | $500 | $0 | $100 | |
C | $1000 | $500 | $0 |
For example, the cost of misclassifying a C
rating customer as an A
rating customer is $1000. The costs
indicate that classifying a customer with bad credit as a customer with good
credit is more costly than classifying a customer with good credit as a customer
with bad credit.
Create a matrix variable that contains the misclassification costs. Create another variable that specifies the class names and their order in the matrix variable.
ClassificationCosts = [0 100 200; 500 0 100; 1000 500 0]; ClassNames = categorical({'A','B','C'});
Tip
Alternatively, you can specify misclassification costs directly inside the Classification Learner app. See Specify Misclassification Costs for more information.
Open Classification Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.
On the Classification Learner tab, in the File section, select New Session > From Workspace.
In the New Session dialog box, select the table
creditrating
from the Data Set
Variable list.
As shown in the dialog box, the app selects the response and predictor
variables based on their data type. The default response variable is the
Rating
variable. The default validation option is
cross-validation, to protect against overfitting. For this example, do not
change the default settings.
To accept the default settings, click Start Session.
Specify the misclassification costs. On the Classification Learner tab, in the Options section, click Misclassification Costs. The app opens a dialog box showing the default misclassification costs.
In the dialog box, click Import from Workspace.
In the import dialog box, select
ClassificationCosts
as the cost variable and
ClassNames
as the class order in the cost
variable. Click Import.
The app updates the values in the misclassification costs dialog box. Click OK to save your changes.
Train fine, medium, and coarse trees simultaneously. On the Classification Learner tab, in the Model Type section, click the arrow to open the gallery. In the Decision Trees group, click All Trees. In the Training section, click Train. The app trains one of each tree model type and displays the models in the History list.
Tip
If you have Parallel Computing Toolbox™, the Opening Pool dialog box opens the first time you click Train (or when you click Train again after an extended period of time). The dialog box remains open while the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can train multiple classifiers simultaneously and continue working.
Note
Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.
In the History list, click a model to view the results, which are displayed in the Current Model pane. Each model has a validation Accuracy score that indicates the percentage of correctly predicted responses. In the History list, the app highlights the highest Accuracy score by outlining it in a box.
Inspect the accuracy of the predictions in each class. On the Classification Learner tab, in the Plots section, click Confusion Matrix. The app displays a matrix of true class and predicted class results for the selected model (in this case, for the medium tree).
You can also plot results per predicted class to investigate false discovery rates. Under Plot, select the Positive Predictive Values (PPV) False Discovery Rates (FDR) option.
In the confusion matrix for the medium tree, the entries below the diagonal have small percentage values. These values indicate that the model tries to avoid assigning a credit rating that is higher than the true rating for a customer.
Compare the total misclassification costs of the tree models. To inspect the total misclassification cost of a model, select the model in the History list, and then view the Results section of the Current Model pane. For example, the medium tree has these results.
In general, choose a model that has high accuracy and low total misclassification cost. In this example, the medium tree has the highest validation accuracy value and the lowest total misclassification cost of the three models.
You can perform feature selection and transformation or tune your model just as you do in the workflow without misclassification costs. However, always check the total misclassification cost of your model when assessing its performance. For differences in the exported model and exported code when you use misclassification costs, see Misclassification Costs in Exported Model and Generated Code.