Class: RegressionLinear
Predict response of linear regression model
Mdl
— Linear regression modelRegressionLinear
model objectLinear regression model, specified as a RegressionLinear
model
object. You can create a RegressionLinear
model
object using fitrlinear
.
X
— Predictor data used to generate responsesPredictor data used to generate responses, specified as a full or sparse numeric matrix or a table.
By default, each row of X
corresponds to one
observation, and each column corresponds to one variable.
For a numeric matrix:
The variables in the columns of X
must have the same order as the predictor variables that
trained Mdl
.
If you train Mdl
using a table
(for example, Tbl
) and
Tbl
contains only numeric
predictor variables, then X
can be
a numeric matrix. To treat numeric predictors in
Tbl
as categorical during
training, identify categorical predictors by using the
CategoricalPredictors
name-value pair
argument of fitrlinear
.
If Tbl
contains heterogeneous
predictor variables (for example, numeric and
categorical data types) and X
is a
numeric matrix, then predict
throws
an error.
For a table:
predict
does not support
multicolumn variables or cell arrays other than cell
arrays of character vectors.
If you train Mdl
using a table
(for example, Tbl
), then all
predictor variables in X
must have
the same variable names and data types as the variables
that trained Mdl
(stored in
Mdl.PredictorNames
). However, the
column order of X
does not need to
correspond to the column order of
Tbl
. Also, Tbl
and
X
can contain additional
variables (response variables, observation weights, and
so on), but predict
ignores
them.
If you train Mdl
using a numeric
matrix, then the predictor names in
Mdl.PredictorNames
must be the
same as the corresponding predictor variable names in
X
. To specify predictor names
during training, use the PredictorNames
name-value pair argument
of fitrlinear
. All predictor
variables in X
must be numeric
vectors. X
can contain additional
variables (response variables, observation weights, and
so on), but predict
ignores
them.
Note
If you orient your predictor matrix so that observations correspond to
columns and specify 'ObservationsIn','columns'
, then
you might experience a significant reduction in optimization execution
time. You cannot specify 'ObservationsIn','columns'
for predictor data in a table.
Data Types: double
| single
| table
dimension
— Predictor data observation dimension'rows'
(default) | 'columns'
Predictor data observation dimension, specified as
'columns'
or 'rows'
.
Note
If you orient your predictor matrix so that observations correspond to
columns and specify 'ObservationsIn','columns'
, then
you might experience a significant reduction in optimization execution
time. You cannot specify 'ObservationsIn','columns'
for predictor data in a table.
YHat
— Predicted responsesPredicted responses, returned as a n-by-L numeric
matrix. n is the number of observations in X
and L is
the number of regularization strengths in Mdl.Lambda
. YHat(
is
the response for observation i
,j
)i
using the
linear regression model that has regularization strength Mdl.Lambda(
.j
)
The predicted response using the model with regularization strength j is
x is an observation from the predictor
data matrix X
, and is row vector.
is
the estimated column vector of coefficients. The software stores this
vector in Mdl.Beta(:,
.j
)
is
the estimated, scalar bias, which the software stores in Mdl.Bias(
.j
)
Simulate 10000 observations from this model
is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.
e is random normal error with mean 0 and standard deviation 0.3.
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
Train a linear regression model. Reserve 30% of the observations as a holdout sample.
CVMdl = fitrlinear(X,Y,'Holdout',0.3);
Mdl = CVMdl.Trained{1}
Mdl = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: -0.0066 Lambda: 1.4286e-04 Learner: 'svm' Properties, Methods
CVMdl
is a RegressionPartitionedLinear
model. It contains the property Trained
, which is a 1-by-1 cell array holding a RegressionLinear
model that the software trained using the training set.
Extract the training and test data from the partition definition.
trainIdx = training(CVMdl.Partition); testIdx = test(CVMdl.Partition);
Predict the training- and test-sample responses.
yHatTrain = predict(Mdl,X(trainIdx,:)); yHatTest = predict(Mdl,X(testIdx,:));
Because there is one regularization strength in Mdl
, yHatTrain
and yHatTest
are numeric vectors.
Predict responses from the best-performing, linear regression model that uses a lasso-penalty and least squares.
Simulate 10000 observations as in Predict Test-Sample Responses.
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
Create a set of 15 logarithmically-spaced regularization strengths from through .
Lambda = logspace(-5,-1,15);
Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Optimize the objective function using SpaRSA.
X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso'); numCLModels = numel(CVMdl.Trained)
numCLModels = 5
CVMdl
is a RegressionPartitionedLinear
model. Because fitrlinear
implements 5-fold cross-validation, CVMdl
contains 5 RegressionLinear
models that the software trains on each fold.
Display the first trained linear regression model.
Mdl1 = CVMdl.Trained{1}
Mdl1 = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x15 double] Bias: [1x15 double] Lambda: [1x15 double] Learner: 'leastsquares' Properties, Methods
Mdl1
is a RegressionLinear
model object. fitrlinear
constructed Mdl1
by training on the first four folds. Because Lambda
is a sequence of regularization strengths, you can think of Mdl1
as 11 models, one for each regularization strength in Lambda
.
Estimate the cross-validated MSE.
mse = kfoldLoss(CVMdl);
Higher values of Lambda
lead to predictor variable sparsity, which is a good quality of a regression model. For each regularization strength, train a linear regression model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model.
Mdl = fitrlinear(X,Y,'ObservationsIn','columns','Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso'); numNZCoeff = sum(Mdl.Beta~=0);
In the same figure, plot the cross-validated MSE and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.
figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(mse),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} MSE') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') hold off
Choose the index of the regularization strength that balances predictor variable sparsity and low MSE (for example, Lambda(10)
).
idxFinal = 10;
Extract the model with corresponding to the minimal MSE.
MdlFinal = selectModels(Mdl,idxFinal)
MdlFinal = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: -0.0050 Lambda: 0.0037 Learner: 'leastsquares' Properties, Methods
idxNZCoeff = find(MdlFinal.Beta~=0)
idxNZCoeff = 2×1
100
200
EstCoeff = Mdl.Beta(idxNZCoeff)
EstCoeff = 2×1
1.0051
1.9965
MdlFinal
is a RegressionLinear
model with one regularization strength. The nonzero coefficients EstCoeff
are close to the coefficients that simulated the data.
Simulate 10 new observations, and predict corresponding responses using the best-performing model.
XNew = sprandn(d,10,nz); YHat = predict(MdlFinal,XNew,'ObservationsIn','columns');
Usage notes and limitations:
predict
does not support tall table
data.
For more information, see Tall Arrays.
Usage notes and limitations:
You can generate C/C++ code for both predict
and
update
by using a coder configurer. Or, generate code only for
predict
by using saveLearnerForCoder
,
loadLearnerForCoder
, and codegen
.
Code generation for predict
and update
— Create a coder configurer by using learnerCoderConfigurer
and then generate code by using generateCode
. Then you can update model parameters in the
generated code without having to regenerate the code.
Code generation for predict
— Save a trained model by
using saveLearnerForCoder
. Define an
entry-point function that loads the saved model by using loadLearnerForCoder
and calls the
predict
function. Then use codegen
(MATLAB Coder) to generate code for the
entry-point function.
You can also generate single-precision C/C++ code for
predict
. For single-precision code generation, specify the
name-value pair argument 'DataType','single'
as an additional input to the
loadLearnerForCoder
function.
This table contains
notes about the arguments of predict
. Arguments not included in this
table are fully supported.
Argument | Notes and Limitations |
---|---|
Mdl | For the usage notes and limitations of the model object,
see
Code Generation of the |
X |
|
Name-value pair arguments |
|
For more information, see Introduction to Code Generation.
You have a modified version of this example. Do you want to open this example with your edits?