Predict responses for observations not used for training
returns
cross-validated predicted responses by the cross-validated linear
regression model YHat
= kfoldPredict(CVMdl
)CVMdl
. That is, for every fold, kfoldPredict
predicts
responses for observations that it holds out when it trains using
all other observations.
YHat
contains predicted responses for each
regularization strength in the linear regression models that compose CVMdl
.
CVMdl
— Cross-validated, linear regression modelRegressionPartitionedLinear
model objectCross-validated, linear regression model, specified as a RegressionPartitionedLinear
model object. You can create a
RegressionPartitionedLinear
model using fitrlinear
and specifying any of the one of the cross-validation,
name-value pair arguments, for example, CrossVal
.
To obtain estimates, kfoldPredict applies the same data used to cross-validate the linear
regression model (X
and Y
).
YHat
— Cross-validated predicted responsesCross-validated predicted responses, returned as an
n-by-L numeric array.
n is the number of observations in the predictor data
that created CVMdl
(see X
) and
L is the number of regularization strengths in
CVMdl.Trained{1}.Lambda
.
YHat(
is the predicted response for observation i
,j
)i
using
the linear regression model that has regularization strength
CVMdl.Trained{1}.Lambda(
.j
)
The predicted response using the model with regularization strength j is
x is an observation from the predictor
data matrix X
, and is row vector.
is
the estimated column vector of coefficients. The software stores this
vector in Mdl.Beta(:,
.j
)
is
the estimated, scalar bias, which the software stores in Mdl.Bias(
.j
)
Simulate 10000 observations from this model
is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.
e is random normal error with mean 0 and standard deviation 0.3.
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
Cross-validate a linear regression model.
CVMdl = fitrlinear(X,Y,'CrossVal','on')
CVMdl = RegressionPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 10000 KFold: 10 Partition: [1x1 cvpartition] ResponseTransform: 'none' Properties, Methods
Mdl1 = CVMdl.Trained{1}
Mdl1 = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: 0.0107 Lambda: 1.1111e-04 Learner: 'svm' Properties, Methods
By default, fitrlinear
implements 10-fold cross-validation. CVMdl
is a RegressionPartitionedLinear
model. It contains the property Trained
, which is a 10-by-1 cell array holding 10 RegressionLinear
models that the software trained using the training set.
Predict responses for observations that fitrlinear
did not use in training the folds.
yHat = kfoldPredict(CVMdl);
Because there is one regularization strength in Mdl
, yHat
is a numeric vector.
Simulate 10000 observations as in Predict Cross-Validated Responses.
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
Create a set of 15 logarithmically-spaced regularization strengths from through .
Lambda = logspace(-5,-1,15);
Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Specify using least squares with a lasso penalty and optimizing the objective function using SpaRSA.
X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso');
CVMdl
is a RegressionPartitionedLinear
model. Its Trained
property contains a 5-by-1 cell array of trained RegressionLinear
models, each one holds out a different fold during training. Because fitrlinear
trained using 15 regularization strengths, you can think of each RegressionLinear
model as 15 models.
Predict cross-validated responses.
YHat = kfoldPredict(CVMdl); size(YHat)
ans = 1×2
10000 15
YHat(2,:)
ans = 1×15
-1.7338 -1.7332 -1.7319 -1.7299 -1.7266 -1.7239 -1.7135 -1.7210 -1.7324 -1.7063 -1.6397 -1.5112 -1.2631 -0.7841 -0.0096
YHat
is a 10000-by-15 matrix. YHat(2,:)
is the cross-validated response for observation 2 using the model regularized with all 15 regularization values.
fitrlinear
| predict
| RegressionLinear
| RegressionPartitionedLinear
You have a modified version of this example. Do you want to open this example with your edits?