Predict responses for Gaussian kernel regression model
Predict the test set responses using a Gaussian kernel regression model for the carbig
data set.
Load the carbig
data set.
load carbig
Specify the predictor variables (X
) and the response variable (Y
).
X = [Weight,Cylinders,Horsepower,Model_Year]; Y = MPG;
Delete rows of X
and Y
where either array has NaN
values. Removing rows with NaN
values before passing data to fitrkernel
can speed up training and reduce memory usage.
R = rmmissing([X Y]); X = R(:,1:4); Y = R(:,end);
Reserve 10% of the observations as a holdout sample. Extract the training and test indices from the partition definition.
rng(10) % For reproducibility N = length(Y); cvp = cvpartition(N,'Holdout',0.1); idxTrn = training(cvp); % Training set indices idxTest = test(cvp); % Test set indices
Standardize the training data and train the regression kernel model.
Xtrain = X(idxTrn,:);
Ytrain = Y(idxTrn);
[Ztrain,tr_mu,tr_sigma] = zscore(Xtrain); % Standardize the training data
tr_sigma(tr_sigma==0) = 1;
Mdl = fitrkernel(Ztrain,Ytrain)
Mdl = RegressionKernel ResponseName: 'Y' Learner: 'svm' NumExpansionDimensions: 128 KernelScale: 1 Lambda: 0.0028 BoxConstraint: 1 Epsilon: 0.8617 Properties, Methods
Mdl
is a RegressionKernel
model.
Standardize the test data using the same mean and standard deviation of the training data columns. Predict responses for the test set.
Xtest = X(idxTest,:);
Ztest = (Xtest-tr_mu)./tr_sigma; % Standardize the test data
Ytest = Y(idxTest);
YFit = predict(Mdl,Ztest);
Create a table containing the first 10 observed response values and predicted response values.
table(Ytest(1:10),YFit(1:10),'VariableNames',... {'ObservedValue','PredictedValue'})
ans=10×2 table
ObservedValue PredictedValue
_____________ ______________
18 17.616
14 25.799
24 24.141
25 25.018
14 13.637
14 14.557
18 18.584
27 26.096
21 25.031
13 13.324
Estimate the test set regression loss using the mean squared error loss function.
L = loss(Mdl,Ztest,Ytest)
L = 9.2664
Mdl
— Kernel regression modelRegressionKernel
model objectKernel regression model, specified as a RegressionKernel
model object. You can create a
RegressionKernel
model object using fitrkernel
.
X
— Predictor data used to generate responsesPredictor data used to generate responses, specified as a numeric matrix or table.
Each row of X
corresponds to one observation, and
each column corresponds to one variable.
For a numeric matrix:
The variables in the columns of X
must have the same order as the predictor variables that
trained Mdl
.
If you trained Mdl
using a table
(for example, Tbl
) and
Tbl
contains all numeric
predictor variables, then X
can be
a numeric matrix. To treat numeric predictors in
Tbl
as categorical during
training, identify categorical predictors using the
CategoricalPredictors
name-value pair
argument of fitrkernel
. If Tbl
contains heterogeneous predictor variables (for example,
numeric and categorical data types) and
X
is a numeric matrix, then
predict
throws an error.
For a table:
predict
does not support
multicolumn variables or cell arrays other than cell
arrays of character vectors.
If you trained Mdl
using a table
(for example, Tbl
), then all
predictor variables in X
must have
the same variable names and data types as those that
trained Mdl
(stored in
Mdl.PredictorNames
). However, the
column order of X
does not need to
correspond to the column order of
Tbl
. Also, Tbl
and
X
can contain additional
variables (response variables, observation weights, and
so on), but predict
ignores
them.
If you trained Mdl
using a
numeric matrix, then the predictor names in
Mdl.PredictorNames
and
corresponding predictor variable names in
X
must be the same. To specify
predictor names during training, see the PredictorNames
name-value pair argument
of fitrkernel
. All predictor
variables in X
must be numeric
vectors. X
can contain additional
variables (response variables, observation weights, and
so on), but predict
ignores
them.
Data Types: double
| single
| table
Usage notes and limitations:
predict
does not support tall table
data.
For more information, see Tall Arrays.
fitrkernel
| loss
| RegressionKernel
| resume
You have a modified version of this example. Do you want to open this example with your edits?