Loss of incremental learning model on batch of data
loss
returns the regression or classification loss of a configured incremental learning model for linear regression (incrementalRegressionLinear
object) or linear, binary classification (incrementalClassificationLinear
object).
To measure model performance on a data stream, and store the results in the output model, call updateMetrics
or updateMetricsAndFit
instead.
You can measure the performance of an incremental model on streaming data in three ways:
Cumulative metrics measure the performance since the start of incremental learning.
Window metrics measure the performance on a specified window of observations. The metrics update every time the model processes the specified window.
The loss
function measures the performance only on a specified batch of data.
Load the human activity data set. Randomly shuffle the data.
load humanactivity n = numel(actid); rng(1); % For reproducibility idx = randsample(n,n); X = feat(idx,:); Y = actid(idx);
For details on the data set, display Description
.
Responses can be one of five classes. Dichotomize the response by identifying whether the subject is moving (actid
> 2).
Y = Y > 2;
Create an incremental linear SVM model for binary classification. Prime it for loss
by specifying the class names, the prior class distribution is uniform, arbitrary coefficient and bias values. Specify a metrics window size of 1000 observations.
p = size(X,2); Beta = randn(p,1); Bias = randn(1); Mdl = incrementalClassificationLinear('Beta',Beta,'Bias',Bias,... 'ClassNames',unique(Y),'Prior','uniform','MetricsWindowSize',1000);
Mdl
is an incrementalClassificationLinear
model. All its properties are read-only. As an alternative to specifying arbitrary values, you can take either of the following actions to prime the model:
Train an SVM model using fitcsvm
or fitclinear
on a subset of the data (if such data is available), and then convert the model to an incremental learner by using incrementalLearner
.
Incrementally fit Mdl
to data by using fit
.
Simulate a data stream, and perform the following actions on each incoming chunk of 50 observations.
Call updateMetrics
to measure the cumulative performance and the performance within a window of observations. Overwrite the previous incremental model with a new one to track performance metrics.
Call loss
to measure the model performance on the incoming chunk.
Call fit
to fit the model to the incoming chunk. Overwrite the previous incremental model with a new one fitted to the incoming observation.
Store all performance metrics to monitor their evolution during incremental learning.
% Preallocation numObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); ce = array2table(zeros(nchunk,3),'VariableNames',["Cumulative" "Window" "Loss"]); % Incremental learning for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; Mdl = updateMetrics(Mdl,X(idx,:),Y(idx)); ce{j,["Cumulative" "Window"]} = Mdl.Metrics{"ClassificationError",:}; ce{j,"Loss"} = loss(Mdl,X(idx,:),Y(idx)); Mdl = fit(Mdl,X(idx,:),Y(idx)); end
Mdl
is an incrementalClassificationLinear
model object that has experienced all the data in the stream. During incremental learning and after the model is warmed up, updateMetrics
checks the performance of the model on the incoming observation, then fits the model to that observation. loss
is agnostic of the metrics warm-up period, so it measures the classification error for all iterations.
To see how the performance metrics evolved during training, plot them on separate subplots.
figure; h = plot(ce.Variables); xlim([0 nchunk]); ylim([0 0.05]) ylabel('Classification Error') xline(Mdl.MetricsWarmupPeriod/numObsPerChunk,'r-.'); legend(h,ce.Properties.VariableNames) xlabel('Iteration')
During the metrics warm-up period (before red line), the yellow line represents the classification error on each incoming chunk of data. After the metrics warm-up period, the Mdl
tracks cumulative and window metrics. The cumulative and batch losses converge as fit
fits the incremental model to the incoming data.
Fit an incremental learning model for regression to streaming data, and compute the mean absolute deviation (MAD) on the incoming batches data.
Load the robot arm data set.
load robotarm
n = numel(ytrain);
p = size(Xtrain,2);
For details on the data set, display Description
.
Create an incremental linear model for regression. Configure the model as follows:
Specify a metrics warm-up period of 1000 observations.
Specify a metrics window size of 500 observations.
Track the mean absolute deviation (MAD) to measure the performance of the model. Create an anonymous function that measures the absolute error of each new observation. Create a structure array containing the name MeanAbsoluteError
and its corresponding function.
Prime the model to predict responses by specifying that all regression coefficients and the bias are 0.
maefcn = @(z,zfit,w)(abs(z - zfit)); maemetric = struct("MeanAbsoluteError",maefcn); Mdl = incrementalRegressionLinear('MetricsWarmupPeriod',1000,'MetricsWindowSize',500,... 'Metrics',maemetric,'Beta',zeros(p,1),'Bias',0,'EstimationPeriod',0)
Mdl = incrementalRegressionLinear IsWarm: 0 Metrics: [2×2 table] ResponseTransform: 'none' Beta: [32×1 double] Bias: 0 Learner: 'svm' Properties, Methods
Mdl
is an incrementalRegressionLinear
model object configured for incremental learning.
Simulate a data stream, and perform a incremental learning. At each iteration:
Process a chunk of 50 observations.
Call updateMetrics
to compute cumulative and window metrics on the incoming chunk of data. Overwrite the previous incremental model with a new one fitted to overwrite the previous metrics.
Call loss
to compute the MAD on the incoming chunk of data. Whereas the cumulative and window metrics require that custom losses return the loss for each observation, loss
requires the loss on the entire chunk. Compute the mean of the absolute deviation.
Call fit
to fit the incremental model to the incoming chunk of data.
Store the cumulative, window, and chunk metrics to monitor their evolution during incremental learning.
% Preallocation numObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); mae = array2table(zeros(nchunk,3),'VariableNames',["Cumulative" "Window" "Chunk"]); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; Mdl = updateMetrics(Mdl,Xtrain(idx,:),ytrain(idx)); mae{j,1:2} = Mdl.Metrics{"MeanAbsoluteError",:}; mae{j,3} = loss(Mdl,Xtrain(idx,:),ytrain(idx),'LossFun',@(x,y,w)mean(maefcn(x,y,w))); Mdl = fit(Mdl,Xtrain(idx,:),ytrain(idx)); end
IncrementalMdl
is an incrementalRegressionLinear
model object that has experienced all the data in the stream. During incremental learning and after the model is warmed up, updateMetricsAndFit
checks the performance of the model on the incoming observation, then fits the model to that observation.
Plot the performance metrics to see how they evolved during incremental learning.
figure; h = plot(mae.Variables); ylabel('Mean Absolute Deviation') xline(Mdl.MetricsWarmupPeriod/numObsPerChunk,'r-.'); xlabel('Iteration') legend(h,mae.Properties.VariableNames)
The plot suggests that:
updateMetrics
computes performance metrics only after the metrics warm-up period
updateMetrics
Computes the cumulative metrics during each iteration
updateMetrics
computes the window metrics after processing 500 observations
Because Mdl
was primed to predict observations from the beginning of incremental learning, loss
can compute the MAD on each incoming chunk of data.
Mdl
— Incremental learning modelincrementalClassificationLinear
model object | incrementalRegressionLinear
model objectIncremental learning model, specified as a incrementalClassificationLinear
or incrementalRegressionLinear
model object, created directly or by converting a supported traditionally trained machine learning model using incrementalLearner
. For more details, see the reference page corresponding to the learning problem.
You must prime Mdl
to compute its loss on a batch of observations.
If Mdl
is a converted traditionally trained model, you can compute its loss without any modifications.
Otherwise, Mdl
must satisfy the following criteria, by your specifications or by being fit to data using fit
or updateMetricsAndFit
.
If Mdl
is an incrementalRegressionLinear
model, its model coefficients Mdl.Beta
and bias Mdl.Bias
must be non-empty arrays.
If Mdl
is an incrementalClassificationLinear
model, its model coefficients Mdl.Beta
and bias Mdl.Bias
must be nonempty arrays, the class names set Mdl.ClassNames
must contain two classes, and the prior class distribution Mdl.Prior
must contain known values.
Regardless of object type, if you configured the model so that functions standardize predictor data, predictor means Mdl.Mu
and standard deviations Mdl.Sigma
must be nonempty arrays.
X
— Batch of predictor dataBatch of predictor data with which to compute the loss, specified as a floating-point matrix of n observations and Mdl.NumPredictors
predictor variables. The value of the 'ObservationsIn'
name-value pair argument determines the orientation of the variables and observations.
The length of the observation labels Y
and the number of observations in X
must be equal; Y(
is the label of observation (row or column) j in j
)X
.
The length of the observation labels Y
and the number of observations in X
must be equal; Y(
is the label of observation (row or column) j in j
)X
.
Note
loss
supports only floating-point input predictor data. If the input model Mdl
represents a converted, traditionally trained model and it was fit to categorical data, use dummyvar
to convert each categorical variable to a numeric matrix of dummy variables, and concatenate all dummy-variable matrices and any other numeric predictors. For more details, see Dummy Variables.
Data Types: single
| double
Y
— Batch of labelsBatch of labels with which to compute the loss, specified as a categorical, character, or string array, logical or floating-point vector, or cell array of character vectors for classification problems, and a floating-point vector for regression problems.
The length of the observation labels Y
and the number of observations in X
must be equal; Y(
is the label of observation (row or column) j in j
)X
.
For classification problems:
loss
supports binary classification only.
If the ClassNames
property of the input model Mdl
is non-empty, the following conditions apply.
If Y
contains a label that is not a member of Mdl.ClassNames
loss
issues an error.
The data type of Y
and Mdl.ClassNames
must be the same.
Data Types: char
| string
| cell
| categorical
| logical
| single
| double
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'ObservationsIn','columns','Weights',W
specifies that the columns of the predictor matrix correspond to observations, and the vector W
contains observation weights to apply during incremental learning.'LossFun'
— Loss functionLoss function, specified as the comma-separated pair consisting of 'LossFun'
and a built-in, loss-function name or function handle.
Classification Problems: The following table lists the available loss functions when Mdl
is an incrementalClassificationLinear
model. Specify one using its corresponding character vector or string scalar.
Name | Description |
---|---|
"binodeviance" | Binomial deviance |
"classiferror" (default) | Classification error |
"exponential" | Exponential |
"hinge" | Hinge |
"logit" | Logistic |
"quadratic" | Quadratic |
For more details, see Classification Loss.
Logistic regression learners return posterior probabilities as classification scores, but SVM learners do not (see predict
).
To specify a custom loss function, use function handle notation. The function must have this form:
lossval = lossfcn(C,S,W)
where:
The output argument lossval
is a floating-point scalar.
You choose the function name (
).lossfcn
C
is an n-by-2 logical matrix with rows indicating the class to which the corresponding observation belongs. The column order corresponds to the class order in the ClassNames
property. Construct C
by setting C(
= p
,q
)1
, if observation
is in class p
, for each observation in the specified data. Set the other element in row q
to p
0
.
S
is an n-by-2 numeric matrix of predicted classification scores. S
is similar to the score
output of predict
, where rows correspond to observations in the data and the column order corresponds to the class order in the ClassNames
property. S(
is the classification score of observation p
,q
)
being classified in class p
.q
W
is an n-by-1 numeric vector of observation weights.
Regression Problems: The following table lists the available loss functions when Mdl
is an incrementalRegressionLinear
model. Specify one using its corresponding character vector or string scalar.
Name | Description | Learners Supporting Metric |
---|---|---|
"epsiloninsensitive" | Epsilon-insensitive loss | 'svm' |
"mse" (default) | Weighted mean squared error | 'svm' and 'leastsquares' |
For more details, see Regression Loss.
To specify a custom loss function, use function handle notation. The function must have this form:
lossval = lossfcn(Y,YFit,W)
where:
The output argument lossval
is a floating-point scalar.
You choose the function name (
).lossfcn
Y
is a length n numeric vector of observed responses.
YFit
is a length n numeric vector of corresponding predicted responses.
W
is an n-by-1 numeric vector of observation weights.
Example: 'LossFun',"mse"
Example: 'LossFun',@
lossfcn
Data Types: char
| string
| function_handle
'ObservationsIn'
— Predictor data observation dimension'rows'
(default) | 'columns'
Predictor data observation dimension, specified as the comma-separated pair consisting of 'ObservationsIn'
and 'columns'
or 'rows'
.
'Weights'
— Batch of observation weightsBatch of observation weights, specified as the comma-separated pair consisting of 'Weights'
and a floating-point vector of positive values. loss
weighs the observations in the input data with the corresponding value in Weights
. The size of Weights
must equal n, which is the number of observations in the input data.
By default, Weights
is ones(
.n
,1)
For more details, see Observation Weights.
Data Types: double
| single
Classification loss functions measure the predictive inaccuracy of classification models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model.
Consider the following scenario.
L is the weighted average classification loss.
n is the sample size.
For binary classification:
yj is the observed class label. The software codes it as –1 or 1, indicating the negative or positive class, respectively.
f(Xj) is the raw classification score for observation (row) j of the predictor data X.
mj = yjf(Xj) is the classification score for classifying observation j into the class corresponding to yj. Positive values of mj indicate correct classification and do not contribute much to the average loss. Negative values of mj indicate incorrect classification and contribute significantly to the average loss.
The weight for observation j is wj.
Given this scenario, the following table describes the supported loss functions that you can specify by using the 'LossFun'
name-value pair argument.
Loss Function | Value of LossFun | Equation |
---|---|---|
Binomial deviance | "binodeviance" | |
Exponential loss | "exponential" | |
Classification error | "classiferror" | It is the weighted fraction of misclassified observations where is the class label corresponding to the class with the maximal posterior probability. I{x} is the indicator function. |
Hinge loss | "hinge" | |
Logit loss | "logit" | |
Quadratic loss | "quadratic" |
This figure compares the loss functions for one observation over m. Some functions are normalized to pass through [0,1].
Regression loss functions measure the predictive inaccuracy of regression models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model.
Consider the following scenario.
L is the weighted average classification loss.
n is the sample size.
yj is the observed response of observation j.
f(Xj) = β0 + xjβ is the predicted value of observation j of the predictor data X, where β0 is the bias and β is the vector of coefficients.
The weight for observation j is wj.
Given this scenario, the following table describes the supported loss functions that you can specify by using the 'LossFun'
name-value pair argument.
Loss Function | Value of LossFun | Equation |
---|---|---|
Epsilon-insensitive loss | "epsiloninsensitive" |
|
Mean squared error | "mse" |
|
For classification problems, if the prior class probability distribution is known (Mdl.Prior
is not composed of NaN
values), loss
normalizes observation weights to sum to the prior class probabilities in the respective classes. This action implies that observation weights are the respective prior class probabilities by default.
For regression problems or if the prior class probability distribution is unknown, the software normalizes the specified observation weights to sum to 1 each time you call loss
.
You have a modified version of this example. Do you want to open this example with your edits?