testckfold

Compare accuracies of two classification models by repeated cross-validation

Syntax

h = testckfold(C1,C2,X1,X2)

h = testckfold(C1,C2,X1,X2,Y)

h = testckfold(___,Name,Value)

[h,p,e1,e2]
= testckfold(___)

Description

testckfold statistically assesses the accuracies of two classification models by repeatedly cross-validating the two models, determining the differences in the classification loss, and then formulating the test statistic by combining the classification loss differences. This type of test is particularly appropriate when sample size is limited.

You can assess whether the accuracies of the classification models are different, or whether one classification model performs better than another. Available tests include a 5-by-2 paired t test, a 5-by-2 paired F test, and a 10-by-10 repeated cross-validation t test. For more details, see Repeated Cross-Validation Tests. To speed up computations, testckfold supports parallel computing (requires a Parallel Computing Toolbox™ license).

example

h = testckfold(C1,C2,X1,X2) returns the test decision that results from conducting a 5-by-2 paired F cross-validation test. The null hypothesis is the classification models C1 and C2 have equal accuracy in predicting the true class labels using the predictor and response data in the tables X1 and X2. h = 1 indicates to reject the null hypothesis at the 5% significance level.

testckfold conducts the cross-validation test by applying C1 and C2 to all predictor variables in X1 and X2, respectively. The true class labels in X1 and X2 must be the same. The response variable names in X1, X2, C1.ResponseName, and C2.ResponseName must be the same.

For examples of ways to compare models, see Tips.

example

h = testckfold(C1,C2,X1,X2,Y) applies the full classification model or classification templates C1 and C2 to all predictor variables in the tables or matrices of data X1 and X2, respectively. Y is the table variable name corresponding to the true class labels, or an array of true class labels.

example

h = testckfold(___,Name,Value) uses any of the input arguments in the previous syntaxes and additional options specified by one or more Name,Value pair arguments. For example, you can specify the type of alternative hypothesis, the type of test, or the use of parallel computing.

example

[h,p,e1,e2] = testckfold(___) also returns the p-value for the hypothesis test (p) and the respective classification losses for each cross-validation run and fold (e1 and e2).

Examples

collapse all

Compare Classification Tree Predictor-Selection Algorithms

Open Live Script

At each node, fitctree chooses the best predictor to split using an exhaustive search by default. Alternatively, you can choose to split the predictor that shows the most evidence of dependence with the response by conducting curvature tests. This example statistically compares classification trees grown via exhaustive search for the best splits and grown by conducting curvature tests with interaction.

Load the census1994 data set.

load census1994.mat
rng(1) % For reproducibility

Grow a default classification tree using the training set, adultdata, which is a table. The response-variable name is 'salary'.

C1 = fitctree(adultdata,'salary')

C1 = 
  ClassificationTree
           PredictorNames: {1x14 cell}
             ResponseName: 'salary'
    CategoricalPredictors: [2 4 6 7 8 9 10 14]
               ClassNames: [<=50K    >50K]
           ScoreTransform: 'none'
          NumObservations: 32561


  Properties, Methods

C1 is a full ClassificationTree model. Its ResponseName property is 'salary'. C1 uses an exhaustive search to find the best predictor to split on based on maximal splitting gain.

Grow another classification tree using the same data set, but specify to find the best predictor to split using the curvature test with interaction.

C2 = fitctree(adultdata,'salary','PredictorSelection','interaction-curvature')

C2 = 
  ClassificationTree
           PredictorNames: {1x14 cell}
             ResponseName: 'salary'
    CategoricalPredictors: [2 4 6 7 8 9 10 14]
               ClassNames: [<=50K    >50K]
           ScoreTransform: 'none'
          NumObservations: 32561


  Properties, Methods

C2 also is a full ClassificationTree model with ResponseName equal to 'salary'.

Conduct a 5-by-2 paired F test to compare the accuracies of the two models using the training set. Because the response-variable names in the data sets and the ResponseName properties are all equal, and the response data in both sets are equal, you can omit supplying the response data.

h = testckfold(C1,C2,adultdata,adultdata)

h = logical
   0

h = 0 indicates to not reject the null hypothesis that C1 and C2 have the same accuracies at 5% level.

Compare Accuracies of Two Different Classification Models

Open Live Script

Conduct a statistical test comparing the misclassification rates of the two models using a 5-by-2 paired F test.

Load Fisher's iris data set.

load fisheriris;

Create a naive Bayes template and a classification tree template using default options.

C1 = templateNaiveBayes;
C2 = templateTree;

C1 and C2 are template objects corresponding to the naive Bayes and classification tree algorithms, respectively.

Test whether the two models have equal predictive accuracies. Use the same predictor data for each model. testckfold conducts a 5-by-2, two-sided, paired F test by default.

rng(1); % For reproducibility
h = testckfold(C1,C2,meas,meas,species)

h = logical
   0

h = 0 indicates to not reject the null hypothesis that the two models have equal predictive accuracies.

Compare Classification Accuracies of Simple and Complex Models

Open Live Script

Conduct a statistical test to assess whether a simpler model has better accuracy than a more complex model using a 10-by-10 repeated cross-validation t test.

Load Fisher's iris data set. Create a cost matrix that penalizes misclassifying a setosa iris twice as much as misclassifying a virginica iris as a versicolor.

load fisheriris;
tabulate(species)

       Value    Count   Percent
      setosa       50     33.33%
  versicolor       50     33.33%
   virginica       50     33.33%

Cost = [0 2 2;2 0 1;2 1 0];
ClassNames  = {'setosa' 'versicolor' 'virginica'};...
    % Specifies the order of the rows and columns in Cost

The empirical distribution of the classes is uniform, and the classification cost is slightly imbalanced.

Create two ECOC templates: one that uses linear SVM binary learners and one that uses SVM binary learners equipped with the RBF kernel.

tSVMLinear = templateSVM('Standardize',true); % Linear SVM by default
tSVMRBF = templateSVM('KernelFunction','RBF','Standardize',true);
C1 = templateECOC('Learners',tSVMLinear);
C2 = templateECOC('Learners',tSVMRBF);

C1 and C2 are ECOC template objects. C1 is prepared for linear SVM. C2 is prepared for SVM with an RBF kernel training.

Test the null hypothesis that the simpler model (C1) is at most as accurate as the more complex model (C2) in terms of classification costs. Conduct the 10-by-10 repeated cross-validation test. Request to return p-values and misclassification costs.

rng(1); % For reproducibility
[h,p,e1,e2] = testckfold(C1,C2,meas,meas,species,...
    'Alternative','greater','Test','10x10t','Cost',Cost,...
    'ClassNames',ClassNames)

h = logical
   0

p = 0.1077

e1 = 10×10

         0         0         0    0.0667         0    0.0667    0.1333         0    0.1333         0
    0.0667    0.0667         0         0         0         0    0.0667         0    0.0667    0.0667
         0         0         0         0         0    0.0667    0.0667    0.0667    0.0667    0.0667
    0.0667    0.0667         0    0.0667         0    0.0667         0         0    0.0667         0
    0.0667    0.0667    0.0667         0    0.0667    0.0667         0         0         0         0
         0         0    0.1333         0         0    0.0667         0         0    0.0667    0.0667
    0.0667    0.0667         0         0    0.0667         0         0    0.0667         0    0.0667
    0.0667         0    0.0667    0.0667         0    0.1333         0    0.0667         0         0
         0    0.0667    0.1333    0.0667    0.0667         0         0         0         0         0
         0    0.0667    0.0667    0.0667    0.0667         0         0    0.0667         0         0

e2 = 10×10

         0         0         0    0.1333         0    0.0667    0.1333         0    0.2667         0
    0.0667    0.0667         0    0.1333         0         0         0    0.1333    0.1333    0.0667
    0.1333    0.1333         0         0         0    0.0667         0    0.0667    0.0667    0.0667
         0    0.1333         0    0.0667    0.1333    0.1333         0         0    0.0667         0
    0.0667    0.0667    0.0667         0    0.0667    0.1333    0.1333         0         0    0.0667
    0.0667         0    0.0667    0.0667         0    0.0667    0.1333         0    0.0667    0.0667
    0.2000    0.0667         0         0    0.0667         0         0    0.1333         0    0.0667
    0.2000         0         0    0.1333         0    0.1333         0    0.0667         0         0
         0    0.0667    0.0667    0.0667    0.1333         0    0.2000         0         0         0
    0.0667    0.0667         0    0.0667    0.1333         0         0    0.0667    0.1333    0.0667

The p-value is slightly greater than 0.10, which indicates to retain the null hypothesis that the simpler model is at most as accurate as the more complex model. This result is consistent for any significance level (Alpha) that is at most 0.10.

e1 and e2 are 10-by-10 matrices containing misclassification costs. Row r corresponds to run r of the repeated cross validation. Column k corresponds to test-set fold k within a particular cross-validation run. For example, element (2,4) of e2 is 0.1333. This value means that in cross-validation run 2, when the test set is fold 4, the estimated test-set misclassification cost is 0.1333.

Select Features Using Statistical Accuracy Comparison

This example uses:

Open Live Script

Reduce classification model complexity by selecting a subset of predictor variables (features) from a larger set. Then, statistically compare the accuracy between the two models.

Load the ionosphere data set.

load ionosphere

Train an ensemble of 100 boosted classification trees using AdaBoostM1 and the entire set of predictors. Inspect the importance measure for each predictor.

t = templateTree('MaxNumSplits',1); % Weak-learner template tree object
C = fitcensemble(X,Y,'Method','AdaBoostM1','Learners',t);
predImp = predictorImportance(C);

bar(predImp)
h = gca;
h.XTick = 1:2:h.XLim(2);
title('Predictor Importances')
xlabel('Predictor')
ylabel('Importance measure')

Identify the top five predictors in terms of their importance.

[~,idxSort] = sort(predImp,'descend');
idx5 = idxSort(1:5);

Test whether the two models have equal predictive accuracies. Specify the reduced data set and then the full predictor data. Use parallel computing to speed up computations.

s = RandStream('mlfg6331_64');
Options = statset('UseParallel',true,'Streams',s,'UseSubstreams',true);

[h,p,e1,e2] = testckfold(C,C,X(:,idx5),X,Y,'Options',Options)

Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 6).

h = logical
   0

p = 0.4161

e1 = 5×2

    0.0686    0.0795
    0.0800    0.0625
    0.0914    0.0568
    0.0400    0.0739
    0.0914    0.0966

e2 = 5×2

    0.0914    0.0625
    0.1257    0.0682
    0.0971    0.0625
    0.0800    0.0909
    0.0914    0.1193

testckfold treats trained classification models as templates, and so it ignores all fitted parameters in C. That is, testckfold cross validates C using only the specified options and the predictor data to estimate the out-of-fold classification losses.

h = 0 indicates to not reject the null hypothesis that the two models have equal predictive accuracies. This result favors the simpler ensemble.

Input Arguments

collapse all

`C1` — Classification model template or trained classification model
classification model template object | trained classification model object

Classification model template or trained classification model, specified as any classification model template object or trained classification model object described in these tables.

Template Type	Returned By
Classification tree	`templateTree`
Discriminant analysis	`templateDiscriminant`
Ensemble (boosting, bagging, and random subspace)	`templateEnsemble`
Error-correcting output codes (ECOC), multiclass classification model	`templateECOC`
kNN	`templateKNN`
Naive Bayes	`templateNaiveBayes`
Support Vector Machine (SVM)	`templateSVM`

Trained Model Type	Model Object	Returned By
Classification tree	`ClassificationTree`	`fitctree`
Discriminant analysis	`ClassificationDiscriminant`	`fitcdiscr`
Ensemble of bagged classification models	`ClassificationBaggedEnsemble`	`fitcensemble`
Ensemble of classification models	`ClassificationEnsemble`	`fitcensemble`
ECOC model	`ClassificationECOC`	`fitcecoc`
kNN	`ClassificationKNN`	`fitcknn`
Naive Bayes	`ClassificationNaiveBayes`	`fitcnb`
SVM	`ClassificationSVM`	`fitcsvm`

For efficiency, supply a classification model template object instead of a trained classification model object.

`C2` — Classification model template or trained model
classification model template object | trained classification model object

Classification model template or trained classification model, specified as any classification model template object or trained classification model object described in these tables.

Template Type	Returned By
Classification tree	`templateTree`
Discriminant analysis	`templateDiscriminant`
Ensemble (boosting, bagging, and random subspace)	`templateEnsemble`
Error-correcting output codes (ECOC), multiclass classification model	`templateECOC`
kNN	`templateKNN`
Naive Bayes	`templateNaiveBayes`
Support Vector Machine (SVM)	`templateSVM`

Trained Model Type	Model Object	Returned By
Classification tree	`ClassificationTree`	`fitctree`
Discriminant analysis	`ClassificationDiscriminant`	`fitcdiscr`
Ensemble of bagged classification models	`ClassificationBaggedEnsemble`	`fitcensemble`
Ensemble of classification models	`ClassificationEnsemble`	`fitcensemble`
ECOC model	`ClassificationECOC`	`fitcecoc`
kNN	`ClassificationKNN`	`fitcknn`
Naive Bayes	`ClassificationNaiveBayes`	`fitcnb`
SVM	`ClassificationSVM`	`fitcsvm`

For efficiency, supply a classification model template object instead of a trained classification model object.

`X1` — Data used to apply to first full classification model or template
numeric matrix | table

Data used to apply to the first full classification model or template, C1, specified as a numeric matrix or table.

Each row of X1 corresponds to one observation, and each column corresponds to one variable. testckfold does not support multi-column variables and cell arrays other than cell arrays of character vectors.

X1 and X2 must be of the same data type, and X1, X2, Y must have the same number of observations.

If you specify Y as an array, then testckfold treats all columns of X1 as separate predictor variables.

Data Types: double | single | table

`X2` — Data used to apply to second full classification model or template
numeric matrix | table

Data used to apply to the second full classification model or template, C2, specified as a numeric matrix or table.

Each row of X2 corresponds to one observation, and each column corresponds to one variable. testckfold does not support multi-column variables and cell arrays other than cell arrays of character vectors.

X1 and X2 must be of the same data type, and X1, X2, Y must have the same number of observations.

If you specify Y as an array, then testckfold treats all columns of X2 as separate predictor variables.

Data Types: double | single | table

`Y` — True class labels
categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors | character vector | string scalar

True class labels, specified as a categorical, character, or string array, a logical or numeric vector, a cell array of character vectors, or a character vector or string scalar.

For a character vector or string scalar, X1 and X2 must be tables, their response variables must have the same name and values, and Y must be the common variable name. For example, if X1.Labels and X2.Labels are the response variables, then Y is 'Labels' and X1.Labels and X2.Labels must be equivalent.
For all other supported data types, Y is an array of true class labels.
- If Y is a character array, then each element must correspond to one row of the array.
- X1, X2, Y must have the same number of observations (rows).
If both of these statements are true, then you can omit supplying Y.
- X1 and X2 are tables containing the same response variable (values and name).
- C1 and C2 are full classification models containing ResponseName properties specifying the response variable names in X1 and X2.
Consequently, testckfold uses the common response variable in the tables. For example, if the response variables in the tables are X1.Labels and X2.Labels, and the values of C1.ResponseName and C2.ResponseName are 'Labels', then you do not have to supply Y.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Alternative','greater','Test','10x10t','Options',statsset('UseParallel',true) specifies to test whether the first set of first predicted class labels is more accurate than the second set, to conduct the 10-by-10 t test, and to use parallel computing for cross-validation.

`'Alpha'` — Hypothesis test significance level
`0.05` (default) | scalar value in the interval (0,1)

Hypothesis test significance level, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the interval (0,1).

Example: 'Alpha',0.1

Data Types: single | double

`'Alternative'` — Alternative hypothesis to assess
`'unequal'` (default) | `'greater'` | `'less'`

Alternative hypothesis to assess, specified as the comma-separated pair consisting of 'Alternative' and one of the values listed in the table.

Value	Alternative Hypothesis Description	Supported Tests
`'unequal'` (default)	For predicting `Y`, the set of predictions resulting from `C1` applied to `X1` and `C2` applied to `X2` have unequal accuracies.	`'5x2F'`, `'5x2t'`, and `'10x10t'`
`'greater'`	For predicting `Y`, the set of predictions resulting from `C1` applied to `X1` is more accurate than `C2` applied to `X2`.	`'5x2t'` and `'10x10t'`
`'less'`	For predicting `Y`, the set of predictions resulting from `C1` applied to `X1` is less accurate than `C2` applied to `X2`.	`'5x2t'` and `'10x10t'`

For details on supported tests, see Test.

Example: 'Alternative','greater'

`'X1CategoricalPredictors'` — Flag identifying categorical predictors
`[]` (default) | logical vector | numeric vector | `'all'`

Flag identifying categorical predictors in the first test-set predictor data (X1), specified as the comma-separated pair consisting of 'X1CategoricalPredictors' and one of the following:

A numeric vector with indices from 1 through p, where p is the number of columns of X1.
A logical vector of length p, where a true entry means that the corresponding column of X1 is a categorical variable.
'all', meaning all predictors are categorical.

Specification of X1CategoricalPredictors is appropriate if:

At least one predictor is categorical and C1 is a classification tree, an ensemble of classification trees, an ECOC model, or a naive Bayes classification model.
All predictors are categorical and C1 is a kNN classification model.

If you specify X1CategoricalPredictors for any other case, then testckfold throws an error. For example, the function cannot train SVM learners using categorical predictors.

The default is [], which indicates that there are no categorical predictors.

Example: 'X1CategoricalPredictors','all'

Data Types: single | double | logical | char | string

`'X2CategoricalPredictors'` — Flag identifying categorical predictors
`[]` (default) | logical vector | numeric vector | `'all'`

Flag identifying categorical predictors in the second test-set predictor data (X2), specified as the comma-separated pair consisting of 'X2CategoricalPredictors' and one of the following:

A numeric vector with indices from 1 through p, where p is the number of columns of X2.
A logical vector of length p, where a true entry means that the corresponding column of X2 is a categorical variable.
'all', meaning all predictors are categorical.

Specification of X2CategoricalPredictors is appropriate if:

At least one predictor is categorical and C2 is a classification tree, an ensemble of classification trees, an ECOC model, or a naive Bayes classification model.
All predictors are categorical and C2 is a kNN classification model.

If you specify X2CategoricalPredictors for any other case, then testckfold throws an error. For example, the function cannot train SVM learners using categorical predictors.

The default is [], which indicates that there are no categorical predictors.

Example: 'X2CategoricalPredictors','all'

Data Types: single | double | logical | char | string

`'ClassNames'` — Class names
categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors

Class names, specified as the comma-separated pair consisting of 'ClassNames' and a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. You must set ClassNames using the data type of Y.

If ClassNames is a character array, then each element must correspond to one row of the array.

Use ClassNames to:

Specify the order of any input argument dimension that corresponds to class order. For example, use ClassNames to specify the order of the dimensions of Cost.
Select a subset of classes for testing. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To train and test models using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}.

The default is the set of all distinct class names in Y.

Example: 'ClassNames',{'b','g'}

`'Cost'` — Classification cost
square matrix | structure array

Classification cost, specified as the comma-separated pair consisting of 'Cost' and a square matrix or structure array.

If you specify the square matrix Cost, then Cost(i,j) is the cost of classifying a point into class j if its true class is i. That is, the rows correspond to the true class and the columns correspond to the predicted class. To specify the class order for the corresponding rows and columns of Cost, additionally specify the ClassNames name-value pair argument.
If you specify the structure S, then S must have two fields:
- S.ClassNames, which contains the class names as a variable of the same data type as Y. You can use this field to specify the order of the classes.
- S.ClassificationCosts, which contains the cost matrix, with rows and columns ordered as in S.ClassNames

For cost-sensitive testing use, testcholdout.

It is a best practice to supply the same cost matrix used to train the classification models.

The default is Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j.

Example: 'Cost',[0 1 2 ; 1 0 2; 2 2 0]

Data Types: double | single | struct

`'LossFun'` — Loss function
`'classiferror'` (default) | `'binodeviance'` | `'exponential'` | `'hinge'` | function handle

Loss function, specified as the comma-separated pair consisting of 'LossFun' and 'classiferror', 'binodeviance', 'exponential', 'hinge', or a function handle.

The following table lists the available loss functions.

Value Loss Function
'binodeviance' Binomial deviance
'classiferror' Classification error
'exponential' Exponential loss
'hinge' Hinge loss
Specify your own function using function handle notation.
Suppose that n = size(X,1) is the sample size and there are K unique classes. Your function must have the signature lossvalue = lossfun(C,S,W,Cost), where:
- The output argument lossvalue is a scalar.
- lossfun is the name of your function.
- C is an n-by-K logical matrix with rows indicating which class the corresponding observation belongs to. The column order corresponds to the class order in the ClassNames name-value pair argument.
  Construct C by setting C(p,q) = 1 if observation p is in class q, for each row. Set all other elements of row p to 0.
- S is an n-by-K numeric matrix of classification scores. The column order corresponds to the class order in the ClassNames name-value pair argument. S is a matrix of classification scores.
- W is an n-by-1 numeric vector of observation weights. If you pass W, the software normalizes the weights to sum to 1.
- Cost is a K-by-K numeric matrix of classification costs. For example, Cost = ones(K) - eye(K) specifies a cost of 0 for correct classification and a cost of 1 for misclassification.
Specify your function using 'LossFun',@lossfun.

Value	Loss Function
`'binodeviance'`	Binomial deviance
`'classiferror'`	Classification error
`'exponential'`	Exponential loss
`'hinge'`	Hinge loss

`'Options'` — Parallel computing options
`[]` (default) | structure array returned by `statset`

Parallel computing options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. These options require Parallel Computing Toolbox. testckfold uses 'Streams', 'UseParallel', and 'UseSubtreams' fields.

This table summarizes the available options.

Option Description

Option	Description
`'Streams'`	A `RandStream` object or cell array of such objects. If you do not specify `Streams`, the software uses the default stream or streams. If you specify `Streams`, use a single object except when the following are true: You have an open parallel pool. `UseParallel` is `true`. `UseSubstreams` is `false`. In that case, use a cell array of the same size as the parallel pool. If a parallel pool is not open, then the software tries to open one (depending on your preferences), and `Streams` must supply a single random number stream.
`'UseParallel'`	If you have Parallel Computing Toolbox, then you can invoke a pool of workers by setting `'UseParallel',true`.
`'UseSubstreams'`	Set to `true` to compute in parallel using the stream specified by `'Streams'`. Default is `false`. For example, set `Streams` to a type allowing substreams, such as`'mlfg6331_64'` or `'mrg32k3a'`.

'Streams'

A RandStream object or cell array of such objects. If you do not specify Streams, the software uses the default stream or streams. If you specify Streams, use a single object except when the following are true:

You have an open parallel pool.
UseParallel is true.
UseSubstreams is false.

In that case, use a cell array of the same size as the parallel pool. If a parallel pool is not open, then the software tries to open one (depending on your preferences), and Streams must supply a single random number stream.

'UseParallel' If you have Parallel Computing Toolbox, then you can invoke a pool of workers by setting 'UseParallel',true.

'UseSubstreams' Set to true to compute in parallel using the stream specified by 'Streams'. Default is false. For example, set Streams to a type allowing substreams, such as'mlfg6331_64' or 'mrg32k3a'.

Example: 'Options',statset('UseParallel',true)

Data Types: struct

`'Prior'` — Prior probabilities
`'empirical'` (default) | `'uniform'` | numeric vector | structure

Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and 'empirical', 'uniform', a numeric vector, or a structure.

This table summarizes the available options for setting prior probabilities.

Value	Description
`'empirical'`	The class prior probabilities are the class relative frequencies in `Y`.
`'uniform'`	All class prior probabilities are equal to 1/K, where K is the number of classes.
numeric vector	Each element is a class prior probability. Specify the order using the `ClassNames` name-value pair argument. The software normalizes the elements such that they sum to `1`.
structure	A structure `S` with two fields: `S.ClassNames` contains the class names as a variable of the same type as `Y`. `S.ClassProbs` contains a vector of corresponding prior probabilities. The software normalizes the elements such that they sum to `1`.

Example: 'Prior',struct('ClassNames',{{'setosa','versicolor'}},'ClassProbs',[1,2])

Data Types: char | string | single | double | struct

`'Test'` — Test to conduct
`'5x2F'` (default) | `'5x2t'` | `'10x10t'`

Test to conduct, specified as the comma-separated pair consisting of 'Test' and one of he following: '5x2F', '5x2t', '10x10t'.

Value	Description	Supported Alternative Hypothesis
`'5x2F'` (default)	5-by-2 paired F test. Appropriate for two-sided testing only.	`'unequal'`
`'5x2t'`	5-by-2 paired t test	`'unequal'`, `'less'`, `'greater'`
`'10x10t'`	10-by-10 repeated cross-validation t test	`'unequal'`, `'less'`, `'greater'`

For details on the available tests, see Repeated Cross-Validation Tests. For details on supported alternative hypotheses, see Alternative.

Example: 'Test','10x10t'

`'Verbose'` — Verbosity level
`0` (default) | `1` | `2`

Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0, 1, or 2. Verbose controls the amount of diagnostic information that the software displays in the Command Window during training of each cross-validation fold.

This table summarizes the available verbosity level options.

Value	Description
`0`	The software does not display diagnostic information.
`1`	The software displays diagnostic messages every time it implements a new cross-validation run.
`2`	The software displays diagnostic messages every time it implements a new cross-validation run, and every time it trains on a particular fold.

Example: 'Verbose',1

Data Types: double | single

`'Weights'` — Observation weights
`ones(size(X,1),1)` (default) | numeric vector

Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector.

The size of Weights must equal the number of rows of X1. The software weighs the observations in each row of X with the corresponding weight in Weights.

The software normalizes Weights to sum up to the value of the prior probability in the respective class.

Data Types: double | single

Notes:

testckfold treats trained classification models as templates. Therefore, it ignores all fitted parameters in the model. That is, testckfold cross-validates using only the options specified in the model and the predictor data.
The repeated cross-validation tests depend on the assumption that the test statistics are asymptotically normal under the null hypothesis. Highly imbalanced cost matrices (for example, Cost = [0 100;1 0]) and highly discrete response distributions (that is, most of the observations are in a small number of classes) might violate the asymptotic normality assumption. For cost-sensitive testing, use testcholdout.
NaNs, <undefined> values, empty character vectors (''), empty strings (""), and <missing> values indicate missing data values.
- For the treatment of missing values in X1 and X2, see the appropriate classification model training function reference page: fitctree, fitcdiscr, fitcensemble, fitcecoc, fitcknn, fitcnb, or fitcsvm.
- testckfold removes missing values in Y and the corresponding rows of X1 and X2.

Output Arguments

collapse all

`h` — Hypothesis test result
`1` | `0`

Hypothesis test result, returned as a logical value.

h = 1 indicates the rejection of the null hypothesis at the Alpha significance level.

h = 0 indicates failure to reject the null hypothesis at the Alpha significance level.

Data Types: logical

`p` — p-value
scalar in the interval [0,1]

p-value of the test, returned as a scalar in the interval [0,1]. p is the probability that a random test statistic is at least as extreme as the observed test statistic, given that the null hypothesis is true.

testckfold estimates p using the distribution of the test statistic, which varies with the type of test. For details on test statistics, see Repeated Cross-Validation Tests.

`e1` — Classification losses
numeric matrix

Classification losses, returned as a numeric matrix. The rows of e1 correspond to the cross-validation run and the columns correspond to the test fold.

testckfold applies the first test-set predictor data (X1) to the first classification model (C1) to estimate the first set of class labels.

e1 summarizes the accuracy of the first set of class labels predicting the true class labels (Y) for each cross-validation run and fold. The meaning of the elements of e1 depends on the type of classification loss.

`e2` — Classification losses
numeric matrix

Classification losses, returned as a numeric matrix. The rows of e2 correspond to the cross-validation run and the columns correspond to the test fold.

testckfold applies the second test-set predictor data (X2) to the second classification model (C2) to estimate the second set of class labels.

e2 summarizes the accuracy of the second set of class labels predicting the true class labels (Y) for each cross-validation run and fold. The meaning of the elements of e2 depends on the type of classification loss.

More About

collapse all

Repeated Cross-Validation Tests

Repeated cross-validation tests form the test statistic for comparing the accuracies of two classification models by combining the classification loss differences resulting from repeatedly cross-validating the data. Repeated cross-validation tests are useful when sample size is limited.

To conduct an R-by-K test:

Randomly divide (stratified by class) the predictor data sets and true class labels into K sets, R times. Each division is called a run and each set within a run is called a fold. Each run contains the complete, but divided, data sets.
For runs r = 1 through R, repeat these steps for k = 1 through K:
1. Reserve fold k as a test set, and train the two classification models using their respective predictor data sets on the remaining K – 1 folds.
2. Predict class labels using the trained models and their respective fold k predictor data sets.
3. Estimate the classification loss by comparing the two sets of estimated labels to the true labels. Denote $e_{c r k}$ as the classification loss when the test set is fold k in run r of classification model c.
4. Compute the difference between the classification losses of the two models:
  
  ${\hat{δ}}_{r k} = e_{1 r k} - e_{2 r k} .$
At the end of a run, there are K classification losses per classification model.
Combine the results of step 2. For each r = 1 through R:
- Estimate the within-fold averages of the differences and their average: ${\bar{δ}}_{r} = \frac{1}{K} \sum_{k = 1}^{K} {\hat{δ}}_{k r} .$
- Estimate the overall average of the differences: $\bar{δ} = \frac{1}{K R} \sum_{r = 1}^{R} \sum_{k = 1}^{K} {\hat{δ}}_{r k} .$
- Estimate the within-fold variances of the differences: $s_{r}^{2} = \frac{1}{K} \sum_{k = 1}^{K} {({\hat{δ}}_{r k} - {\bar{δ}}_{r})}^{2} .$
- Estimate the average of the within-fold differences: ${\bar{s}}^{2} = \frac{1}{R} \sum_{r = 1}^{R} s_{r}^{2} .$
- Estimate the overall sample variance of the differences: $S^{2} = \frac{1}{K R - 1} \sum_{r = 1}^{R} \sum_{k = 1}^{K} {({\hat{δ}}_{r k} - \bar{δ})}^{2} .$
Compute the test statistic. All supported tests described here assume that, under H₀, the estimated differences are independent and approximately normally distributed, with mean 0 and a finite, common standard deviation. However, these tests violate the independence assumption, and so the test-statistic distributions are approximate.
- For R = 2, the test is a paired test. The two supported tests are a paired t and F test.
  - The test statistic for the paired t test is
    
    $t_{p a i r e d}^{*} = \frac{{\hat{δ}}_{11}}{\sqrt{{\bar{s}}^{2}}} .$
    $t_{p a i r e d}^{*}$ has a t-distribution with R degrees of freedom under the null hypothesis.
    To reduce the effects of correlation between the estimated differences, the quantity ${\hat{δ}}_{11}$ occupies the numerator rather than $\bar{δ}$ .
    5-by-2 paired t tests can be slightly conservative [4].
  - The test statistic for the paired F test is
    
    $F_{p a i r e d}^{*} = \frac{\frac{1}{R K} \sum_{r = 1}^{R} \sum_{k = 1}^{K} {({\hat{δ}}_{r k})}^{2}}{{\bar{s}}^{2}} .$
    $F_{p a i r e d}^{*}$ has an F distribution with RK and R degrees of freedom.
    A 5-by-2 paired F test has comparable power to the 5-by-2 t test, but is more conservative [1].
- For R > 2, the test is a repeated cross-validation test. The test statistic is
  
  $t_{C V}^{*} = \frac{\bar{δ}}{S / \sqrt{ν + 1}} .$
  $t_{C V}^{*}$ has a t distribution with ν degrees of freedom. If the differences were truly independent, then ν = RK – 1. In this case, the degrees of freedom parameter must be optimized.
  For a 10-by-10 repeated cross-validation t test, the optimal degrees of freedom between 8 and 11 ([2] and [3]). testckfold uses ν = 10.

The advantage of repeated cross-validation tests over paired tests is that the results are more repeatable [3]. The disadvantage is that they require high computational resources.

Classification Loss

Classification losses indicate the accuracy of a classification model or set of predicted labels. In general, for a fixed cost matrix, classification accuracy decreases as classification loss increases.

testckfold returns the classification losses (see e1 and e2) under the alternative hypothesis (that is, the unrestricted classification losses). In the definitions that follow:

The classification losses focus on the first classification model. The classification losses for the second model are similar.
n_test is the test-set sample size.
I(x) is the indicator function. If x is a true statement, then I(x) = 1. Otherwise, I(x) = 0.
${\hat{p}}_{1 j}$ is the predicted class assignment of classification model 1 for observation j.
y_j is the true class label of observation j.

Binomial deviance has the form

$e_{1} = \frac{\sum_{j = 1}^{n_{t e s t}} w_{j} \log (1 + \exp (- 2 y_{j}^{'} f (X_{j})))}{\sum_{j = 1}^{n_{t e s t}} w_{j}}$
where:
- y_j = 1 for the positive class and -1 for the negative class.
- $f (X_{j})$ is the classification score.
The binomial deviance has connections to the maximization of the binomial likelihood function. For details on binomial deviance, see [5].
Exponential loss is similar to binomial deviance and has the form

$e_{1} = \frac{\sum_{j = 1}^{n_{t e s t}} w_{j} \exp (- y_{j} f (X_{j}))}{\sum_{j = 1}^{n_{t e s t}} w_{j}} .$
y_j and $f (X_{j})$ take the same forms here as in the binomial deviance formula.
Hinge loss has the form

$e_{1} = \frac{\sum_{j = 1}^{n} w_{j} \max {0, 1 - y_{j}' f (X_{j})}}{\sum_{j = 1}^{n} w_{j}},$
y_j and $f (X_{j})$ take the same forms here as in the binomial deviance formula.
Hinge loss linearly penalizes for misclassified observations and is related to the SVM objective function used for optimization. For more details on hinge loss, see [5].
Misclassification rate, or classification error, is a scalar in the interval [0,1] representing the proportion of misclassified observations. That is, the misclassification rate for the first classification model is

$e_{1} = \frac{\sum_{j = 1}^{n_{t e s t}} w_{j} I ({\hat{p}}_{1 j} \neq y_{j})}{\sum_{j = 1}^{n_{t e s t}} w_{j}} .$

Tips

Examples of ways to compare models include:
- Compare the accuracies of a simple classification model and a more complex model by passing the same set of predictor data.
- Compare the accuracies of two different models using two different sets of predictors.
- Perform various types of Feature Selection. For example, you can compare the accuracy of a model trained using a set of predictors to the accuracy of one trained on a subset or different set of predictors. You can arbitrarily choose the set of predictors, or use a feature selection technique like PCA or sequential feature selection (see pca and sequentialfs).
If both of these statements are true, then you can omit supplying Y.
- X1 and X2 are tables containing the response variable and use the same response variable name.
- C1 and C2 are full classification models containing equal ResponseName properties (e.g. strcmp(C1.ResponseName,C2.ResponseName) = 1).
Consequently, testckfold uses the common response variable in the tables.
One way to perform cost-insensitive feature selection is:
1. Create a classification model template that characterizes the first classification model (C1).
2. Create a classification model template that characterizes the second classification model (C2).
3. Specify two predictor data sets. For example, specify X1 as the full predictor set and X2 as a reduced set.
4. Enter testckfold(C1,C2,X1,X2,Y,'Alternative','less'). If testckfold returns 1, then there is enough evidence to suggest that the classification model that uses fewer predictors performs better than the model that uses the full predictor set.
Alternatively, you can assess whether there is a significant difference between the accuracies of the two models. To perform this assessment, remove the 'Alternative','less' specification in step 4.testckfold conducts a two-sided test, and h = 0 indicates that there is not enough evidence to suggest a difference in the accuracy of the two models.
The tests are appropriate for the misclassification rate classification loss, but you can specify other loss functions (see LossFun). The key assumptions are that the estimated classification losses are independent and normally distributed with mean 0 and finite common variance under the two-sided null hypothesis. Classification losses other than the misclassification rate can violate this assumption.
Highly discrete data, imbalanced classes, and highly imbalanced cost matrices can violate the normality assumption of classification loss differences.

Algorithms

If you specify to conduct the 10-by-10 repeated cross-validation t test using 'Test','10x10t', then testckfold uses 10 degrees of freedom for the t distribution to find the critical region and estimate the p-value. For more details, see [2] and [3].

Alternatives

Use testcholdout:

For test sets with larger sample sizes
To implement variants of the McNemar test to compare two classification model accuracies
For cost-sensitive testing using a chi-square or likelihood ratio test. The chi-square test uses quadprog (Optimization Toolbox), which requires an Optimization Toolbox™ license.

References

[1] Alpaydin, E. “Combined 5 x 2 CV F Test for Comparing Supervised Classification Learning Algorithms.” Neural Computation, Vol. 11, No. 8, 1999, pp. 1885–1992.

[2] Bouckaert. R. “Choosing Between Two Learning Algorithms Based on Calibrated Tests.” International Conference on Machine Learning, 2003, pp. 51–58.

[3] Bouckaert, R., and E. Frank. “Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms.” Advances in Knowledge Discovery and Data Mining, 8th Pacific-Asia Conference, 2004, pp. 3–12.

[4] Dietterich, T. “Approximate statistical tests for comparing supervised classification learning algorithms.” Neural Computation, Vol. 10, No. 7, 1998, pp. 1895–1923.

[5] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd Ed. New York: Springer, 2008.

Extended Capabilities

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, set the 'UseParallel' option to true.

Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function.

For example: 'Options',statset('UseParallel',true)

For more information, see the 'Options' name-value pair argument.

For more general information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

Documentation

testckfold

Syntax

Description

Examples

Compare Classification Tree Predictor-Selection Algorithms

Compare Accuracies of Two Different Classification Models

Compare Classification Accuracies of Simple and Complex Models

Select Features Using Statistical Accuracy Comparison

Input Arguments

C1 — Classification model template or trained classification model classification model template object | trained classification model object

C2 — Classification model template or trained model classification model template object | trained classification model object

X1 — Data used to apply to first full classification model or template numeric matrix | table

X2 — Data used to apply to second full classification model or template numeric matrix | table

Y — True class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors | character vector | string scalar

Name-Value Pair Arguments

'Alpha' — Hypothesis test significance level 0.05 (default) | scalar value in the interval (0,1)

'Alternative' — Alternative hypothesis to assess 'unequal' (default) | 'greater' | 'less'

'X1CategoricalPredictors' — Flag identifying categorical predictors [] (default) | logical vector | numeric vector | 'all'

'X2CategoricalPredictors' — Flag identifying categorical predictors [] (default) | logical vector | numeric vector | 'all'

'ClassNames' — Class names categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors

'Cost' — Classification cost square matrix | structure array

'LossFun' — Loss function 'classiferror' (default) | 'binodeviance' | 'exponential' | 'hinge' | function handle

'Options' — Parallel computing options [] (default) | structure array returned by statset

'Prior' — Prior probabilities 'empirical' (default) | 'uniform' | numeric vector | structure

'Test' — Test to conduct '5x2F' (default) | '5x2t' | '10x10t'

'Verbose' — Verbosity level 0 (default) | 1 | 2

'Weights' — Observation weights ones(size(X,1),1) (default) | numeric vector

Output Arguments

h — Hypothesis test result 1 | 0

p — p-value scalar in the interval [0,1]

e1 — Classification losses numeric matrix

e2 — Classification losses numeric matrix

More About

Repeated Cross-Validation Tests

Classification Loss

Tips

Algorithms

Alternatives

References

Extended Capabilities

Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

`C1` — Classification model template or trained classification model
classification model template object | trained classification model object

`C2` — Classification model template or trained model
classification model template object | trained classification model object

`X1` — Data used to apply to first full classification model or template
numeric matrix | table

`X2` — Data used to apply to second full classification model or template
numeric matrix | table

`Y` — True class labels
categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors | character vector | string scalar

`'Alpha'` — Hypothesis test significance level
`0.05` (default) | scalar value in the interval (0,1)

`'Alternative'` — Alternative hypothesis to assess
`'unequal'` (default) | `'greater'` | `'less'`

`'X1CategoricalPredictors'` — Flag identifying categorical predictors
`[]` (default) | logical vector | numeric vector | `'all'`

`'X2CategoricalPredictors'` — Flag identifying categorical predictors
`[]` (default) | logical vector | numeric vector | `'all'`

`'ClassNames'` — Class names
categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors

`'Cost'` — Classification cost
square matrix | structure array

`'LossFun'` — Loss function
`'classiferror'` (default) | `'binodeviance'` | `'exponential'` | `'hinge'` | function handle

`'Options'` — Parallel computing options
`[]` (default) | structure array returned by `statset`

`'Prior'` — Prior probabilities
`'empirical'` (default) | `'uniform'` | numeric vector | structure

`'Test'` — Test to conduct
`'5x2F'` (default) | `'5x2t'` | `'10x10t'`

`'Verbose'` — Verbosity level
`0` (default) | `1` | `2`

`'Weights'` — Observation weights
`ones(size(X,1),1)` (default) | numeric vector

`h` — Hypothesis test result
`1` | `0`

`p` — p-value
scalar in the interval [0,1]

`e1` — Classification losses
numeric matrix

`e2` — Classification losses
numeric matrix

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.