kfoldPredict

Predict responses for observations not used for training

Syntax

YHat = kfoldPredict(CVMdl)

Description

YHat = kfoldPredict(CVMdl) returns cross-validated predicted responses by the cross-validated linear regression model CVMdl. That is, for every fold, kfoldPredict predicts responses for observations that it holds out when it trains using all other observations.

YHat contains predicted responses for each regularization strength in the linear regression models that compose CVMdl.

Input Arguments

expand all

`CVMdl` — Cross-validated, linear regression model
`RegressionPartitionedLinear` model object

Cross-validated, linear regression model, specified as a RegressionPartitionedLinear model object. You can create a RegressionPartitionedLinear model using fitrlinear and specifying any of the one of the cross-validation, name-value pair arguments, for example, CrossVal.

To obtain estimates, kfoldPredict applies the same data used to cross-validate the linear regression model (X and Y).

Output Arguments

expand all

`YHat` — Cross-validated predicted responses
numeric array

Cross-validated predicted responses, returned as an n-by-L numeric array. n is the number of observations in the predictor data that created CVMdl (see X) and L is the number of regularization strengths in CVMdl.Trained{1}.Lambda. YHat(i,j) is the predicted response for observation i using the linear regression model that has regularization strength CVMdl.Trained{1}.Lambda(j).

The predicted response using the model with regularization strength j is ${\hat{y}}_{j} = x β_{j} + b_{j} .$

x is an observation from the predictor data matrix X, and is row vector.
$β_{j}$ is the estimated column vector of coefficients. The software stores this vector in Mdl.Beta(:,j).
$b_{j}$ is the estimated, scalar bias, which the software stores in Mdl.Bias(j).

Examples

expand all

Predict Cross-Validated Responses

Open Live Script

Simulate 10000 observations from this model

$y = x_{100} + 2 x_{200} + e .$

$X = x_{1}, . . ., x_{1000}$ is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.
e is random normal error with mean 0 and standard deviation 0.3.

rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Cross-validate a linear regression model.

CVMdl = fitrlinear(X,Y,'CrossVal','on')

CVMdl = 
  RegressionPartitionedLinear
    CrossValidatedModel: 'Linear'
           ResponseName: 'Y'
        NumObservations: 10000
                  KFold: 10
              Partition: [1x1 cvpartition]
      ResponseTransform: 'none'


  Properties, Methods

Mdl1 = CVMdl.Trained{1}

Mdl1 = 
  RegressionLinear
         ResponseName: 'Y'
    ResponseTransform: 'none'
                 Beta: [1000x1 double]
                 Bias: 0.0107
               Lambda: 1.1111e-04
              Learner: 'svm'


  Properties, Methods

By default, fitrlinear implements 10-fold cross-validation. CVMdl is a RegressionPartitionedLinear model. It contains the property Trained, which is a 10-by-1 cell array holding 10 RegressionLinear models that the software trained using the training set.

Predict responses for observations that fitrlinear did not use in training the folds.

yHat = kfoldPredict(CVMdl);

Because there is one regularization strength in Mdl, yHat is a numeric vector.

Predict for Models Containing Several Regularization Strengths

Open Live Script

Simulate 10000 observations as in Predict Cross-Validated Responses.

rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Create a set of 15 logarithmically-spaced regularization strengths from $1 0^{- 5}$ through $1 0^{- 1}$ .

Lambda = logspace(-5,-1,15);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Specify using least squares with a lasso penalty and optimizing the objective function using SpaRSA.

X = X'; 
CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,...
    'Learner','leastsquares','Solver','sparsa','Regularization','lasso');

CVMdl is a RegressionPartitionedLinear model. Its Trained property contains a 5-by-1 cell array of trained RegressionLinear models, each one holds out a different fold during training. Because fitrlinear trained using 15 regularization strengths, you can think of each RegressionLinear model as 15 models.

Predict cross-validated responses.

YHat = kfoldPredict(CVMdl);
size(YHat)

ans = 1×2

       10000          15

YHat(2,:)

ans = 1×15

   -1.7338   -1.7332   -1.7319   -1.7299   -1.7266   -1.7239   -1.7135   -1.7210   -1.7324   -1.7063   -1.6397   -1.5112   -1.2631   -0.7841   -0.0096

YHat is a 10000-by-15 matrix. YHat(2,:) is the cross-validated response for observation 2 using the model regularized with all 15 regularization values.

Documentation

kfoldPredict

Syntax

Description

Input Arguments

`CVMdl` — Cross-validated, linear regression model
`RegressionPartitionedLinear` model object

Output Arguments

`YHat` — Cross-validated predicted responses
numeric array

Examples

Predict Cross-Validated Responses

Predict for Models Containing Several Regularization Strengths

See Also

Statistics and Machine Learning Toolbox Documentation

Support

Documentation

kfoldPredict

Syntax

Description

Input Arguments

CVMdl — Cross-validated, linear regression model RegressionPartitionedLinear model object

Output Arguments

YHat — Cross-validated predicted responses numeric array

Examples

Predict Cross-Validated Responses

Predict for Models Containing Several Regularization Strengths

See Also

Statistics and Machine Learning Toolbox Documentation

Support

`CVMdl` — Cross-validated, linear regression model
`RegressionPartitionedLinear` model object

`YHat` — Cross-validated predicted responses
numeric array