ClassificationPartitionedModel

Package: classreg.learning.partition

Cross-validated classification model

Description

ClassificationPartitionedModel is a set of classification models trained on cross-validated folds. Estimate the quality of classification by cross validation using one or more “kfold” methods: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun.

Every “kfold” method uses models trained on in-fold observations to predict the response for out-of-fold observations. For example, suppose you cross validate using five folds. In this case, the software randomly assigns each observation into five roughly equally sized groups. The training fold contains four of the groups (i.e., roughly 4/5 of the data) and the test fold contains the other group (i.e., roughly 1/5 of the data). In this case, cross validation proceeds as follows:

The software trains the first model (stored in CVMdl.Trained{1}) using the observations in the last four groups and reserves the observations in the first group for validation.
The software trains the second model (stored in CVMdl.Trained{2}) using the observations in the first group and last three groups, and reserves the observations in the second group for validation.
The software proceeds in a similar fashion for the third to fifth models.

If you validate by calling kfoldPredict, it computes predictions for the observations in group 1 using the first model, group 2 for the second model, and so on. In short, the software estimates a response for every observation using the model trained without that observation.

Construction

CVMdl = crossval(Mdl) creates a cross-validated classification model from a classification model (Mdl).

Alternatively:

CVDiscrMdl = fitcdiscr(X,Y,Name,Value)
CVKNNMdl = fitcknn(X,Y,Name,Value)
CVNBMdl = fitcnb(X,Y,Name,Value)
CVSVMMdl = fitcsvm(X,Y,Name,Value)
CVTreeMdl = fitctree(X,Y,Name,Value)

create a cross-validated model when name is either 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. For syntax details, see fitcdiscr, fitcknn, fitcnb, fitcsvm, and fitctree.

Input Arguments

Mdl

A classification model. Mdl can be any of the following:

A classification tree trained using fitctree
A discriminant analysis classifier trained using fitcdiscr
A naive Bayes classifier trained using fitcnb
A nearest-neighbor classifier trained using fitcknn
A support vector machine classifier trained using fitcsvm

Properties

BinEdges

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the 'NumBins' name-value pair argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

CategoricalPredictors

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]).

If Mdl is a trained discriminant analysis classifier, then CategoricalPredictors is always empty ([]).

ClassNames

Unique class labels used in training the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors.

Cost

Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response.

If CVModel is a cross-validated ClassificationDiscriminant, ClassificationKNN, or ClassificationNaiveBayes model, then you can change its cost matrix to e.g., CostMatrix, using dot notation.

CVModel.Cost = CostMatrix;

CrossValidatedModel

Name of the cross-validated model, which is a character vector.

KFold

Number of folds used in cross-validated model, which is a positive integer.

ModelParameters

Object holding parameters of CVModel.

Partition

The partition of class CVPartition used in creating the cross-validated model.

PredictorNames

Predictor variable names, specified as a cell array of character vectors. The order of the elements of PredictorNames corresponds to the order in which the predictor names appear in the training data.

Prior

Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames.

If CVModel is a cross-validated ClassificationDiscriminant or ClassificationNaiveBayes model, then you can change its vector of priors to e.g., priorVector, using dot notation.

CVModel.Prior = priorVector;

ResponseName

Response variable name, specified as a character vector.

ScoreTransform

Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores.

To change the score transformation function to function, for example, use dot notation.

For a built-in function, enter a character vector.

Mdl.ScoreTransform = 'function';

This table describes the available built-in functions.

Value	Description
`'doublelogit'`	1/(1 + e^–2x)
`'invlogit'`	log(x / (1 – x))
`'ismax'`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`'logit'`	1/(1 + e^–x)
`'none'` or `'identity'`	x (no transformation)
`'sign'`	–1 for x < 0 0 for x = 0 1 for x > 0
`'symmetric'`	2x – 1
`'symmetricismax'`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`'symmetriclogit'`	2/(1 + e^–x) – 1

For a MATLAB^® function or a function that you define, enter its function handle.
```
Mdl.ScoreTransform = @function;
```
function should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Trained

The trained learners, which is a cell array of compact classification models.

W

The scaled weights, which is a vector with length n, the number of rows in X.

X

A matrix or table of predictor values. Each column of X represents one variable, and each row represents one observation.

Y

Categorical or character array, logical or numeric vector, or cell array of character vectors specifying the class labels for each observation. Y has the same number of rows as X, and each entry of Y is the response to the data in the corresponding row of X.

Methods

kfoldEdge	Classification edge for observations not used for training
kfoldLoss	Classification loss for observations not used for training
kfoldMargin	Classification margins for observations not used for training
kfoldPredict	Predict response for observations not used for training
kfoldfun	Cross validate function

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects.

Examples

collapse all

Evaluate the Classification Error of a Classification Tree Classifier

Open Live Script

Evaluate the k-fold cross-validation error for a classification tree model.

Load Fisher's iris data set.

load fisheriris

Train a classification tree using default options.

Mdl = fitctree(meas,species);

Cross validate the classification tree model.

CVMdl = crossval(Mdl);

Estimate the 10-fold cross-validation loss.

L = kfoldLoss(CVMdl)

L = 0.0533

Estimate Posterior Probabilities for Test Samples

Open Live Script

Estimate positive class posterior probabilities for the test set of an SVM algorithm.

Load the ionosphere data set.

load ionosphere

Train an SVM classifier. Specify a 20% holdout sample. It is good practice to standardize the predictors and specify the class order.

rng(1) % For reproducibility
CVSVMModel = fitcsvm(X,Y,'Holdout',0.2,'Standardize',true,...
    'ClassNames',{'b','g'});

CVSVMModel is a trained ClassificationPartitionedModel cross-validated classifier.

Estimate the optimal score function for mapping observation scores to posterior probabilities of an observation being classified as 'g'.

ScoreCVSVMModel = fitSVMPosterior(CVSVMModel);

ScoreSVMModel is a trained ClassificationPartitionedModel cross-validated classifier containing the optimal score transformation function estimated from the training data.

Estimate the out-of-sample positive class posterior probabilities. Display the results for the first 10 out-of-sample observations.

[~,OOSPostProbs] = kfoldPredict(ScoreCVSVMModel);
indx = ~isnan(OOSPostProbs(:,2));
hoObs = find(indx); % Holdout observation numbers
OOSPostProbs = [hoObs, OOSPostProbs(indx,2)];
table(OOSPostProbs(1:10,1),OOSPostProbs(1:10,2),...
    'VariableNames',{'ObservationIndex','PosteriorProbability'})

ans=10×2 table
    ObservationIndex    PosteriorProbability
    ________________    ____________________

            6                   0.17381     
            7                   0.89639     
            8                 0.0076613     
            9                   0.91602     
           16                  0.026722     
           22                4.6114e-06     
           23                    0.9024     
           24                2.4137e-06     
           38                0.00042705     
           41                   0.86427

Tips

To estimate posterior probabilities of trained, cross-validated SVM classifiers, use fitSVMPosterior.

Documentation