kfoldPredict

Predict response for observations not used for training

Syntax

label = kfoldPredict(obj) [label,score] = kfoldPredict(obj) [label,score,cost] = kfoldPredict(obj)

Description

label = kfoldPredict(obj) returns class labels predicted by obj, a cross-validated classification. For every fold, kfoldPredict predicts class labels for in-fold observations using a model trained on out-of-fold observations.

[label,score] = kfoldPredict(obj) returns the predicted classification scores for in-fold observations using a model trained on out-of-fold observations.

[label,score,cost] = kfoldPredict(obj) returns misclassification costs.

Input Arguments

obj

Object of class ClassificationPartitionedModel or ClassificationPartitionedEnsemble.

Output Arguments

`label`	Vector of class labels of the same type as the response data used in training `obj`. (The software treats string arrays as cell arrays of character vectors.) Each entry of `label` corresponds to a predicted class label for the corresponding row of `X`.
`score`	Numeric matrix of size `N`-by-`K`, where `N` is the number of observations (rows) in `obj.X`, and `K` is the number of classes (in `obj.ClassNames`). `score(i,j)` represents the confidence that row `i` of `obj.X` is of class `j`. For details, see More About.
`cost`	Numeric matrix of misclassification costs of size `N`-by-`K`. `cost(i,j)` is the average misclassification cost of predicting that row `i` of `obj.X` is of class `j`.

Examples

expand all

Estimate Cross-Validation Predictions from an Ensemble

Open Live Script

Find the cross-validation predictions for a model based on Fisher's iris data.

Load Fisher's iris data set.

load fisheriris

Train an ensemble of classification trees using AdaBoostM2. Specify tree stumps as the weak learners.

rng(1); % For reproducibility
t = templateTree('MaxNumSplits',1);
Mdl = fitcensemble(meas,species,'Method','AdaBoostM2','Learners',t);

Cross validate the trained ensemble using 10-fold cross validation.

CVMdl = crossval(Mdl);

Estimate cross-validation predicted labels and scores.

[elabel,escore] = kfoldPredict(CVMdl);

Display the maximum and minimum scores of each class.

max(escore)

ans = 1×3

    9.3862    8.9871   10.1866

min(escore)

ans = 1×3

    0.0018    3.8359    0.9573

Create Confusion Matrix Using Cross-Validation Predictions

Open Live Script

Create a confusion matrix using the 10-fold cross-validation predictions of a discriminant analysis model.

Load the fisheriris data set. X contains flower measurements for 150 different flowers, and y lists the species, or class, for each flower. Create a variable order that specifies the order of the classes.

load fisheriris
X = meas;
y = species;
order = unique(y)

order = 3x1 cell
    {'setosa'    }
    {'versicolor'}
    {'virginica' }

Create a 10-fold cross-validated discriminant analysis model by using the fitcdiscr function. By default, fitcdiscr ensures that training and test sets have roughly the same proportions of flower species. Specify the order of the flower classes.

cvmdl = fitcdiscr(X,y,'KFold',10,'ClassNames',order);

Predict the species of the test set flowers.

predictedSpecies = kfoldPredict(cvmdl);

Create a confusion matrix that compares the true class values to the predicted class values.

confusionchart(y,predictedSpecies)

More About

expand all

Cost (discriminant analysis)

The average misclassification cost is the mean misclassification cost for predictions made by the cross-validated classifiers trained on out-of-fold observations. The matrix of expected costs per observation is defined in Cost.

Score

For discriminant analysis, the score of a classification is the posterior probability of the classification. For the definition of posterior probability in discriminant analysis, see Posterior Probability.

For ensembles, a classification score represents the confidence of a classification into a class. The higher the score, the higher the confidence.

Different ensemble algorithms have different definitions for their scores. Furthermore, the range of scores depends on ensemble type. For example:

AdaBoostM1 scores range from –∞ to ∞.
Bag scores range from 0 to 1.

For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.

For example, consider classifying a predictor X as true when X < 0.15 or X > 0.95, and X is false otherwise.

Generate 100 random points and classify them:

rng(0,'twister') % for reproducibility
X = rand(100,1);
Y = (abs(X - .55) > .4);
tree = fitctree(X,Y);
view(tree,'Mode','Graph')

Prune the tree:

tree1 = prune(tree,'Level',1);
view(tree1,'Mode','Graph')

The pruned tree correctly classifies observations that are less than 0.15 as true. It also correctly classifies observations from .15 to .94 as false. However, it incorrectly classifies observations that are greater than .94 as false. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for true, and about .8/.85=.94 for false.

Compute the prediction scores for the first 10 rows of X:

[~,score] = predict(tree1,X(1:10));
[score X(1:10,:)]

ans = 10×3

    0.9059    0.0941    0.8147
    0.9059    0.0941    0.9058
         0    1.0000    0.1270
    0.9059    0.0941    0.9134
    0.9059    0.0941    0.6324
         0    1.0000    0.0975
    0.9059    0.0941    0.2785
    0.9059    0.0941    0.5469
    0.9059    0.0941    0.9575
    0.9059    0.0941    0.9649

Indeed, every value of X (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of 0 and 1, while the other values of X have associated scores of 0.91 and 0.09. The difference (score 0.09 instead of the expected .06) is due to a statistical fluctuation: there are 8 observations in X in the range (.95,1) instead of the expected 5 observations.

Documentation