This example shows how to tune and train a linear SVM regression model using Regression Learner. Then, at the command line, initialize and train an incremental model for linear SVM regression using the information gained from training in the app.
Load the 2015 NYC housing data set, and shuffle the data. For more details on the data, see NYC Open Data.
load(fullfile(matlabroot,'examples','stats','data','NYCHousing2015.mat')) rng(1); % For reproducibility n = size(NYCHousing2015,1); idxshuff = randsample(n,n); NYCHousing2015 = NYCHousing2015(idxshuff,:);
For details on the data set, enter Description
at the command line.
For numerical stability, scale SALEPRICE
by 1e6
.
NYCHousing2015.SALEPRICE = NYCHousing2015.SALEPRICE/1e6;
Consider training a linear SVM regression model to about 1% of the data, and reserving the remaining data for incremental learning.
Regression Learner supports categorical variables. However, because SVM models require dummy coded categorical variables, and the BUILDINGCLASSCATEGORY
and NEIGHBORHOOD
variables contain many levels and some with low representation, the probability that a partition doesn't have all categories is high. Therefore, dummy code all categorical variables. Concatenate the matrix of dummy variables to the rest of the numeric variables.
catvars = ["BOROUGH" "BUILDINGCLASSCATEGORY" "NEIGHBORHOOD"]; dumvars = splitvars(varfun(@(x)dummyvar(categorical(x)),NYCHousing2015,... 'InputVariables',catvars)); NYCHousing2015(:,catvars) = []; idxnum = varfun(@isnumeric,NYCHousing2015,'OutputFormat','uniform'); NYCHousing2015 = [dumvars NYCHousing2015(:,idxnum)];
Randomly partition the data into 1% and 99% subsets by calling cvpartition
and specifying a holdout (test) sample proportion of 0.99
. Create tables for the 1% and 99% partitions.
cvp = cvpartition(n,'HoldOut',0.99);
idxtt = cvp.training;
idxil = cvp.test;
NYCHousing2015tt = NYCHousing2015(idxtt,:);
NYCHousing2015il = NYCHousing2015(idxil,:);
Open Regression Learner by entering regressionLearner
at the command line.
regressionLearner
Alternatively, on the Apps tab, under Machine Learning and Deep Learning, click the app icon.
Choose the training data set and variables.
On the Regression Learner tab, in the File section, click New Session > From Workspace.
In the New Session dialog, under Data Set Variable, select the data set NYCHousing2015tt.
Under Response, select the response variable SALEPRICE.
Click Start Session.
The app implements 5-fold cross-validation by default.
Train a linear SVM regression model. Tune only the Epsilon hyperparameter by using Bayesian optimization.
On the Regression Learner tab, in the Model Type section, click the arrow to expand the list of models. In the Support Vector Machines section, select Optimizable SVM
.
On the Regression Learner tab, in the Model Type section, select Advanced.
In the Select SVM Hyperparameters to Optimize dialog:
Deselect the Optimize boxes for all options except Epsilon.
Set the value of Kernel scale to Manual
and 1
.
Deselect the Value box of Standardize data.
Dismiss the Select SVM Hyperparameters to Optimize dialog.
On the Regression Learner tab, in the Training section, select Train.
Regression Learner shows a plot of the generalization minimum MSE of the model as optimization progresses.
Export the trained, optimized linear SVM regression model.
On the Regression Learner tab, in the Export section, select Export Model.
At the Export Model dialog, select OK.
Regression Learner passes the trained model, among other variables, in the structure array trainedModel
to the workspace. You can dismiss Regression Learner .
Extract the trained SVM regression model from trainedModel
.
Mdl = trainedModel.RegressionSVM;
Convert the model to an incremental model.
IncrementalMdl = incrementalLearner(Mdl) IncrementalMdl.Epsilon
IncrementalMdl = incrementalRegressionLinear IsWarm: 1 Metrics: [1×2 table] ResponseTransform: 'none' Beta: [44×1 double] Bias: 19.7917 Learner: 'svm' Properties, Methods
IncrementalMdl
is an incrementalRegressionLinear
model object for incremental learning using a linear SVM regression model. incrementalLearner
initializes IncrementalMdl
using the coefficients and the optimized value of the Epsilon
hyperparameter learned from Mdl
. Therefore, you can predict responses by passing IncrementalMdl
and data to predict
. Also, the IsWarm
property is true
, which means that the incremental learning functions measure the model performance from the start of incremental learning.
Because incremental learning functions accept floating-point matrices only, create matrices for the predictor and response data.
Xil = NYCHousing2015il{:,1:(end-1)}; Yil = NYCHousing2015il{:,end};
Use the updateMetricsAndFit
function to perform incremental learning on the 99% data partition. Simulate a data stream by processing 500 observations at a time. At each iteration:
Call updateMetricsAndFit
to update the cumulative and window epsilon insensitive loss of the model given the incoming chunk of observations. Overwrite the previous incremental model to update the losses in the Metrics
property. Note that the function does not fit the model to the chunk of data — the chunk is "new" data for the model.
Store the losses and last estimated coefficient β313.
% Preallocation nil = sum(idxil); numObsPerChunk = 500; nchunk = floor(nil/numObsPerChunk); ei = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]); beta313 = [IncrementalMdl.Beta(end); zeros(nchunk,1)]; % Incremental learning for j = 1:nchunk ibegin = min(nil,numObsPerChunk*(j-1) + 1); iend = min(nil,numObsPerChunk*j); idx = ibegin:iend; IncrementalMdl = updateMetricsAndFit(IncrementalMdl,Xil(idx,:),Yil(idx)); ei{j,:} = IncrementalMdl.Metrics{"EpsilonInsensitiveLoss",:}; beta313(j + 1) = IncrementalMdl.Beta(end); end
IncrementalMdl
is an incrementalRegressionLinear
model object that has experienced all the data in the stream.
Plot a trace plots of the performance metrics and estimated coefficient β313.
figure; subplot(2,1,1) h = plot(ei.Variables); xlim([0 nchunk]); ylabel('Epsilon Insensitive Loss') legend(h,ei.Properties.VariableNames) subplot(2,1,2) plot(beta313) ylabel('\beta_{313}') xlim([0 nchunk]); xlabel('Iteration')
The cumulative loss gradually changes with iteration (chunk of 500 observations), whereas the window loss jumps. Because the metrics window is 200 by default, and updateMetrcisAndFit
measures the performance based on the latest 200 observations in each 500 observation chunk.
β313 changes abruptly, then levels off as updateMetrcisAndFit
processes chunks of observations.