cvshrink

Cross validate shrinking (pruning) ensemble

Syntax

vals = cvshrink(ens) [vals,nlearn] = cvshrink(ens) [vals,nlearn] = cvshrink(ens,Name,Value)

Description

vals = cvshrink(ens) returns an L-by-T matrix with cross-validated values of the mean squared error. L is the number of lambda values in the ens.Regularization structure. T is the number of threshold values on weak learner weights. If ens does not have a Regularization property filled in by the regularize method, pass a lambda name-value pair.

[vals,nlearn] = cvshrink(ens) returns an L-by-T matrix of the mean number of learners in the cross-validated ensemble.

[vals,nlearn] = cvshrink(ens,Name,Value) cross validates with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments

ens

A regression ensemble, created with fitrensemble.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

`'cvpartition'`	A partition created with `cvpartition` to use in a cross-validated tree. You can only use one of these four options at a time: `'kfold'`, `'holdout'`, `'leaveout'`, or `'cvpartition'`.
`'holdout'`	Holdout validation tests the specified fraction of the data, and uses the rest of the data for training. Specify a numeric scalar from `0` to `1`. You can only use one of these four options at a time for creating a cross-validated tree: `'kfold'`, `'holdout'`, `'leaveout'`, or `'cvpartition'`.
`'kfold'`	Number of folds to use in a cross-validated tree, a positive integer. If you do not supply a cross-validation method, `cvshrink` uses 10-fold cross validation. You can only use one of these four options at a time: `'kfold'`, `'holdout'`, `'leaveout'`, or `'cvpartition'`. Default: `10`
`'lambda'`	Vector of nonnegative regularization parameter values for lasso. If empty, `cvshrink` does not perform cross validation. Default: `[]`
`'leaveout'`	Use leave-one-out cross validation by setting to `'on'`. You can only use one of these four options at a time: `'kfold'`, `'holdout'`, `'leaveout'`, or `'cvpartition'`.
`'threshold'`	Numeric vector with lower cutoffs on weights for weak learners. `cvshrink` discards learners with weights below `threshold` in its cross-validation calculation. Default: `0`

Output Arguments

`vals`	`L`-by-`T` matrix with cross-validated values of the mean squared error. `L` is the number of values of the regularization parameter `'lambda'`, and `T` is the number of `'threshold'` values on weak learner weights.
`nlearn`	`L`-by-`T` matrix with cross-validated values of the mean number of learners in the cross-validated ensemble.`L` is the number of values of the regularization parameter `'lambda'`, and `T` is the number of `'threshold'` values on weak learner weights.

Examples

expand all

Cross-Validate Regression Ensemble

Open Live Script

Create a regression ensemble for predicting mileage from the carsmall data. Cross-validate the ensemble.

Load the carsmall data set and select displacement, horsepower, and vehicle weight as predictors.

load carsmall
X = [Displacement Horsepower Weight];

You can train an ensemble of bagged regression trees.

ens = fitrensemble(X,Y,'Method','Bag')

fircensemble uses a default template tree object templateTree() as a weak learner when 'Method' is 'Bag'. In this example, for reproducibility, specify 'Reproducible',true when you create a tree template object, and then use the object as a weak learner.

rng('default') % For reproducibility
t = templateTree('Reproducible',true); % For reproducibiliy of random predictor selections
ens = fitrensemble(X,MPG,'Method','Bag','Learners',t);

Specify values for lambda and threshold. Use these values to cross-validate the ensemble.

[vals,nlearn] = cvshrink(ens,'lambda',[.01 .1 1],'threshold',[0 .01 .1])

vals = 3×3

   18.1135   18.4634  115.5087
   18.1140   18.4630  115.4477
   18.0823   18.3565  124.1655

nlearn = 3×3

   13.8000   11.5000    3.5000
   13.7000   11.4000    3.5000
   13.8000   11.3000    3.3000

Clearly, setting a threshold of 0.1 leads to unacceptable errors, while a threshold of 0.01 gives similar errors to a threshold of 0. The mean number of learners with a threshold of 0.01 is about 11.4, whereas the mean number is about 13.8 when the threshold is 0.

Documentation