Initialize Incremental Learning Model from SVM Regression Model Trained in Regression Learner

This example shows how to tune and train a linear SVM regression model using Regression Learner. Then, at the command line, initialize and train an incremental model for linear SVM regression using the information gained from training in the app.

Load and Preprocess Data

Load the 2015 NYC housing data set, and shuffle the data. For more details on the data, see NYC Open Data.

load(fullfile(matlabroot,'examples','stats','data','NYCHousing2015.mat'))
rng(1); % For reproducibility
n = size(NYCHousing2015,1);
idxshuff = randsample(n,n);
NYCHousing2015 = NYCHousing2015(idxshuff,:);

For details on the data set, enter Description at the command line.

For numerical stability, scale SALEPRICE by 1e6.

NYCHousing2015.SALEPRICE = NYCHousing2015.SALEPRICE/1e6;

Consider training a linear SVM regression model to about 1% of the data, and reserving the remaining data for incremental learning.

Regression Learner supports categorical variables. However, because SVM models require dummy coded categorical variables, and the BUILDINGCLASSCATEGORY and NEIGHBORHOOD variables contain many levels and some with low representation, the probability that a partition doesn't have all categories is high. Therefore, dummy code all categorical variables. Concatenate the matrix of dummy variables to the rest of the numeric variables.

catvars = ["BOROUGH" "BUILDINGCLASSCATEGORY" "NEIGHBORHOOD"];
dumvars = splitvars(varfun(@(x)dummyvar(categorical(x)),NYCHousing2015,...
      'InputVariables',catvars));
NYCHousing2015(:,catvars) = [];
idxnum = varfun(@isnumeric,NYCHousing2015,'OutputFormat','uniform');
NYCHousing2015 = [dumvars NYCHousing2015(:,idxnum)];

Randomly partition the data into 1% and 99% subsets by calling cvpartition and specifying a holdout (test) sample proportion of 0.99. Create tables for the 1% and 99% partitions.

cvp = cvpartition(n,'HoldOut',0.99);
idxtt = cvp.training;
idxil = cvp.test;
NYCHousing2015tt = NYCHousing2015(idxtt,:);
NYCHousing2015il = NYCHousing2015(idxil,:);

Tune and Train Model Using Regression Learner

Open Regression Learner by entering regressionLearner at the command line.

regressionLearner

Alternatively, on the Apps tab, under Machine Learning and Deep Learning, click the app icon.

Choose the training data set and variables.

On the Regression Learner tab, in the File section, click New Session > From Workspace.
In the New Session dialog, under Data Set Variable, select the data set NYCHousing2015tt.
Under Response, select the response variable SALEPRICE.
Click Start Session.

The app implements 5-fold cross-validation by default.

Train a linear SVM regression model. Tune only the Epsilon hyperparameter by using Bayesian optimization.

On the Regression Learner tab, in the Model Type section, click the arrow to expand the list of models. In the Support Vector Machines section, select Optimizable SVM .
On the Regression Learner tab, in the Model Type section, select Advanced.
In the Select SVM Hyperparameters to Optimize dialog:
1. Deselect the Optimize boxes for all options except Epsilon.
2. Set the value of Kernel scale to Manual and 1.
3. Deselect the Value box of Standardize data.
Dismiss the Select SVM Hyperparameters to Optimize dialog.
On the Regression Learner tab, in the Training section, select Train.

Regression Learner shows a plot of the generalization minimum MSE of the model as optimization progresses.

Minimum MSE of the model

Export the trained, optimized linear SVM regression model.

On the Regression Learner tab, in the Export section, select Export Model.
At the Export Model dialog, select OK.

Regression Learner passes the trained model, among other variables, in the structure array trainedModel to the workspace. You can dismiss Regression Learner .

Convert Exported Model to Incremental Model

Extract the trained SVM regression model from trainedModel.

Mdl = trainedModel.RegressionSVM;

Convert the model to an incremental model.

IncrementalMdl = incrementalLearner(Mdl)
IncrementalMdl.Epsilon

IncrementalMdl = 

  incrementalRegressionLinear

               IsWarm: 1
              Metrics: [1×2 table]
    ResponseTransform: 'none'
                 Beta: [44×1 double]
                 Bias: 19.7917
              Learner: 'svm'


  Properties, Methods

IncrementalMdl is an incrementalRegressionLinear model object for incremental learning using a linear SVM regression model. incrementalLearner initializes IncrementalMdl using the coefficients and the optimized value of the Epsilon hyperparameter learned from Mdl. Therefore, you can predict responses by passing IncrementalMdl and data to predict. Also, the IsWarm property is true, which means that the incremental learning functions measure the model performance from the start of incremental learning.

Implement Incremental Learning

Because incremental learning functions accept floating-point matrices only, create matrices for the predictor and response data.

Xil = NYCHousing2015il{:,1:(end-1)};
Yil = NYCHousing2015il{:,end};

Use the updateMetricsAndFit function to perform incremental learning on the 99% data partition. Simulate a data stream by processing 500 observations at a time. At each iteration:

Call updateMetricsAndFit to update the cumulative and window epsilon insensitive loss of the model given the incoming chunk of observations. Overwrite the previous incremental model to update the losses in the Metrics property. Note that the function does not fit the model to the chunk of data — the chunk is "new" data for the model.
Store the losses and last estimated coefficient β₃₁₃.

% Preallocation
nil = sum(idxil);
numObsPerChunk = 500;
nchunk = floor(nil/numObsPerChunk);
ei = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]);
beta313 = [IncrementalMdl.Beta(end); zeros(nchunk,1)];

% Incremental learning
for j = 1:nchunk
    ibegin = min(nil,numObsPerChunk*(j-1) + 1);
    iend   = min(nil,numObsPerChunk*j);
    idx = ibegin:iend;
    IncrementalMdl = updateMetricsAndFit(IncrementalMdl,Xil(idx,:),Yil(idx));
    ei{j,:} = IncrementalMdl.Metrics{"EpsilonInsensitiveLoss",:};
    beta313(j + 1) = IncrementalMdl.Beta(end);
end

IncrementalMdl is an incrementalRegressionLinear model object that has experienced all the data in the stream.

Plot a trace plots of the performance metrics and estimated coefficient β₃₁₃.

figure;
subplot(2,1,1)
h = plot(ei.Variables);
xlim([0 nchunk]);
ylabel('Epsilon Insensitive Loss')
legend(h,ei.Properties.VariableNames)
subplot(2,1,2)
plot(beta313)
ylabel('\beta_{313}')
xlim([0 nchunk]);
xlabel('Iteration')

Trace plots of the epsilon-insensitive loss and last coefficient

The cumulative loss gradually changes with iteration (chunk of 500 observations), whereas the window loss jumps. Because the metrics window is 200 by default, and updateMetrcisAndFit measures the performance based on the latest 200 observations in each 500 observation chunk.

β₃₁₃ changes abruptly, then levels off as updateMetrcisAndFit processes chunks of observations.

Documentation