Predict responses of generalized linear regression model
Create a generalized linear regression model, and predict its response to new data.
Generate sample data using Poisson random numbers with two underlying predictors X(:,1)
and X(:,2)
.
rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);
Create a generalized linear regression model of Poisson data.
mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson');
Create data points for prediction.
[Xtest1,Xtest2] = meshgrid(-1:.5:3,-2:.5:2); Xnew = [Xtest1(:),Xtest2(:)];
Predict responses at the data points.
ypred = predict(mdl,Xnew);
Plot the predictions.
surf(Xtest1,Xtest2,reshape(ypred,9,9))
Fit a generalized linear regression model, and then save the model by using saveLearnerForCoder
. Define an entry-point function that loads the model by using loadLearnerForCoder
and calls the predict
function of the fitted model. Then use codegen
(MATLAB Coder) to generate C/C++ code. Note that generating C/C++ code requires MATLAB® Coder™.
This example briefly explains the code generation workflow for the prediction of linear regression models at the command line. For more details, see Code Generation for Prediction of Machine Learning Model at Command Line. You can also generate code using the MATLAB Coder app. For details, see Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App.
Train Model
Generate sample data using Poisson random numbers with two underlying predictors X(:,1)
and X(:,2)
.
rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);
Create a generalized linear regression model. Specify the Poisson distribution for the response.
mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson');
Save Model
Save the fitted generalized linear regression model to the file GLMMdl.mat
by using saveLearnerForCoder
.
saveLearnerForCoder(mdl,'GLMMdl');
Define Entry-Point Function
In your current folder, define an entry-point function named mypredictGLM.m
that does the following:
Accept new predictor input and valid name-value pair arguments.
Load the fitted generalized linear regression model in GLMMdl.mat
by using loadLearnerForCoder
.
Return predictions and confidence interval bounds.
function [yhat,ci] = mypredictGLM(x,varargin) %#codegen %MYPREDICTGLM Predict responses using GLM model % MYPREDICTGLM predicts responses for the n observations in the n-by-1 % vector x using the GLM model stored in the MAT-file GLMMdl.mat, % and then returns the predictions in the n-by-1 vector yhat. % MYPREDICTGLM also returns confidence interval bounds for the % predictions in the n-by-2 vector ci. CompactMdl = loadLearnerForCoder('GLMMdl'); narginchk(1,Inf); [yhat,ci] = predict(CompactMdl,x,varargin{:}); end
Add the %#codegen
compiler directive (or pragma) to the entry-point function after the function signature to indicate that you intend to generate code for the MATLAB algorithm. Adding this directive instructs the MATLAB Code Analyzer to help you diagnose and fix violations that would result in errors during code generation.
Generate Code
Generate code for the entry-point function using codegen
(MATLAB Coder). Because C and C++ are statically typed languages, you must determine the properties of all variables in the entry-point function at compile time. To specify the data type and exact input array size, pass a MATLAB® expression that represents the set of values with a certain data type and array size. Use coder.Constant
(MATLAB Coder) for the names of name-value pair arguments.
Create points for prediction.
[Xtest1,Xtest2] = meshgrid(-1:.5:3,-2:.5:2); Xnew = [Xtest1(:),Xtest2(:)];
Generate code and specify returning 90% simultaneous confidence intervals on the predictions.
codegen mypredictGLM -args {Xnew,coder.Constant('Alpha'),0.1,coder.Constant('Simultaneous'),true}
codegen
generates the MEX function mypredictGLM_mex
with a platform-dependent extension.
If the number of observations is unknown at compile time, you can also specify the input as variable-size by using coder.typeof
(MATLAB Coder). For details, see Specify Variable-Size Arguments for Code Generation and Specify Properties of Entry-Point Function Inputs (MATLAB Coder).
Verify Generated Code
Compare predictions and confidence intervals using predict
and mypredictGLM_mex
. Specify name-value pair arguments in the same order as in the -args
argument in the call to codegen
.
[yhat1,ci1] = predict(mdl,Xnew,'Alpha',0.1,'Simultaneous',true); [yhat2,ci2] = mypredictGLM_mex(Xnew,'Alpha',0.1,'Simultaneous',true);
The returned values from mypredictGLM_mex
might include round-off differences compared to the values from predict
. In this case, compare the values allowing a small tolerance.
find(abs(yhat1-yhat2) > 1e-6)
ans = 0x1 empty double column vector
find(abs(ci1-ci2) > 1e-6)
ans = 0x1 empty double column vector
The comparison confirms that the returned values are equal within the tolerance 1e–6
.
mdl
— Generalized linear regression modelGeneralizedLinearModel
object | CompactGeneralizedLinearModel
objectGeneralized linear regression model, specified as a GeneralizedLinearModel
object created using fitglm
or stepwiseglm
, or a CompactGeneralizedLinearModel
object created using compact
.
Xnew
— New predictor input valuesNew predictor input values, specified as a table, dataset array, or matrix. Each row of
Xnew
corresponds to one observation, and each column
corresponds to one variable.
If Xnew
is a table or dataset array, it must contain
predictors that have the same predictor names as in the
PredictorNames
property of
mdl
.
If Xnew
is a matrix, it must have the same number of
variables (columns) in the same order as the predictor input used to create
mdl
. Note that Xnew
must also
contain any predictor variables that are not used as predictors in the fitted
model. Also, all variables used in creating mdl
must be
numeric. To treat numerical predictors as categorical, identify the predictors
using the 'CategoricalVars'
name-value pair argument when
you create mdl
.
Data Types: single
| double
| table
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
[ypred,yci] =
predict(Mdl,Xnew,'Alpha',0.01,'Simultaneous',true)
returns the
confidence interval yci
with a 99% confidence level, computed
simultaneously for all predictor values.'Alpha'
— Significance levelSignificance level for the confidence interval, specified as the comma-separated pair
consisting of 'Alpha'
and a numeric value in the range [0,1]. The
confidence level of yci
is equal to 100(1 – Alpha
)%. Alpha
is the probability that the confidence
interval does not contain the true value.
Example: 'Alpha',0.01
Data Types: single
| double
'BinomialSize'
— Number of trials for binomial distributionNumber of trials for the binomial distribution, specified as the
comma-separated pair consisting of 'BinomialSize'
and
a scalar or vector of the same length as the response.
predict
expands the scalar input into a
constant array of the same size as the response. The scalar input means
that all observations have the same number of trials.
The meaning of the output values in ypred
depends
on the value of 'BinomialSize'
.
If 'BinomialSize'
is 1 (default), then
each value in the output ypred
is the
probability of success.
If 'BinomialSize'
is not 1, then each
value in the output ypred
is the
predicted number of successes in the trials.
Data Types: single
| double
'Offset'
— Offset valuezeros(size(Xnew,1))
(default) | scalar | vectorOffset value for each row in Xnew
, specified as the comma-separated pair consisting of 'Offset'
and a scalar or vector with the same length as the response. predict
expands the scalar input into a constant array of the same size as the response.
Note that the default value of this argument is a vector of zeros even if you specify the
'Offset'
name-value pair argument when fitting a model. If you
specify 'Offset'
for fitting, the software treats the offset as an
additional predictor with a coefficient value fixed at 1. In other words, the formula
for fitting is
f(μ) = Offset + X*b
,
where f is the link function, μ is the mean response, and X*b is the linear combination of predictors X. The Offset
predictor has coefficient 1
.
Data Types: single
| double
'Simultaneous'
— Flag to compute simultaneous confidence boundsfalse
(default) | true
Flag to compute simultaneous confidence bounds, specified as the comma-separated pair
consisting of 'Simultaneous'
and either true or false.
true
— predict
computes
confidence bounds for the curve of response values corresponding to all
predictor values in Xnew
, using Scheffe's method. The
range between the upper and lower bounds contains the curve consisting of
true response values with 100(1 – α)% confidence.
false
— predict
computes
confidence bounds for the response value at each observation in
Xnew
. The confidence interval for a response value
at a specific predictor value contains the true response value with 100(1 –
α)% confidence.
Simultaneous bounds are wider than separate bounds, because requiring the entire curve of response values to be within the bounds is stricter than requiring the response value at a single predictor value to be within the bounds.
Example: 'Simultaneous',true
ypred
— Predicted response valuesPredicted response values at Xnew
, returned as a
numeric vector.
For a binomial model, the meaning of the output values in
ypred
depends on the value of the
'BinomialSize'
name-value pair argument.
If 'BinomialSize'
is 1 (default), then each
value in the output ypred
is the
probability of success.
If 'BinomialSize'
is not 1, then each value
in the output ypred
is the predicted number
of successes in the trials.
For a model with an offset, specify the offset value by using the
'Offset'
name-value pair argument. Otherwise,
predict
uses 0
as the offset
value.
yci
— Confidence intervals for responsesConfidence intervals for the responses, returned as a two-column matrix
with each row providing one interval. The meaning of the confidence interval
depends on the settings of the name-value pair arguments
'Alpha'
and
'Simultaneous'
.
feval
returns the same
predictions as predict
. The feval
function does not support the 'Offset'
and
'BinomialSize'
name-value pair arguments .
feval
uses 0 as the offset value, and the output
values in ypred
are predicted probabilities. The
feval
function can take multiple input arguments
for new predictor input values, with one input for each predictor variable,
which is simpler to use with a model created from a table or dataset array.
Note that the feval
function does not give confidence
intervals on its predictions.
random
returns predictions with
added noise.
Usage notes and limitations:
Use saveLearnerForCoder
, loadLearnerForCoder
, and codegen
(MATLAB Coder) to generate code for the predict
function. Save
a trained model by using saveLearnerForCoder
. Define an entry-point function
that loads the saved model by using loadLearnerForCoder
and calls the
predict
function. Then use codegen
to generate code for the entry-point function.
This table contains
notes about the arguments of predict
. Arguments not included in this
table are fully supported.
Argument | Notes and Limitations |
---|---|
mdl | For the usage notes and limitations of the model object,
see
Code Generation of the
|
Xnew |
|
Name-value pair arguments |
Names in name-value pair arguments must be compile-time constants. For example, to allow a user-defined significance level in the generated code,
include |
For more information, see Introduction to Code Generation.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
This function supports model objects fitted with GPU array input arguments.
CompactGeneralizedLinearModel
| feval
| fitglm
| GeneralizedLinearModel
| random
You have a modified version of this example. Do you want to open this example with your edits?