glmfit

Generalized linear model regression

Syntax

b = glmfit(X,y,distr) b = glmfit(X,y,distr,param1,val1,param2,val2,...) [b,dev] = glmfit(...) [b,dev,stats] = glmfit(...)

Description

b = glmfit(X,y,distr) returns a (p + 1)-by-1 vector b of coefficient estimates for a generalized linear regression of the responses in y on the predictors in X, using the distribution distr. X is an n-by-p matrix of p predictors at each of n observations. distr can be any of the following: 'binomial', 'gamma', 'inverse gaussian', 'normal' (the default), and 'poisson'.

In most cases, y is an n-by-1 vector of observed responses. For the binomial distribution, y can be a binary vector indicating success or failure at each observation, or a two column matrix with the first column indicating the number of successes for each observation and the second column indicating the number of trials for each observation.

This syntax uses the canonical link (see below) to relate the distribution to the predictors.

Note

By default, glmfit adds a first column of 1s to X, corresponding to a constant term in the model. Do not enter a column of 1s directly into X. You can change the default behavior of glmfit using the 'constant' parameter, below.

glmfit treats NaNs in either X or y as missing values, and ignores them.

b = glmfit(X,y,distr,param1,val1,param2,val2,...) additionally allows you to specify optional parameter name/value pairs to control the model fit. Acceptable parameters are as follows.

Parameter	Value	Description
`'link'`	`'identity'`, default for the distribution `'normal'`	µ = Xb
	`'log'`, default for the distribution `'poisson'`	log(µ) = Xb
	`'logit'`, default for the distribution `'binomial'`	log(µ/(1 – µ)) = Xb
	`'probit'`	`norminv`(µ) = Xb
	`'comploglog'`	log( -log(1 – µ)) = Xb
	`'reciprocal'`, default for the distribution `'gamma'`	1/µ = Xb
	`'loglog'`	log( -log(µ)) = Xb
	`p` (a number), default for the distribution `'inverse gaussian'`(with p = -2)	µ^p = Xb
	cell array of the form `{FL FD FI}`, containing three function handles, created using `@`, that define the link (`FL`), the derivative of the link (`FD`), and the inverse link (`FI`).	Custom-defined link function. You must provide `FL(mu)` `FD = dFL(mu)/dmu` `FI = FL^(-1)`
	structure array having these fields: `'Link'` — Link function `'Derivative'` — Derivative of the link function `'Inverse'` — Inverse of the link function The value of each field is a character vector corresponding to a function that is on the path or a function handle (created using `@`).	Custom-defined link function, its derivative, and its inverse.
`'estdisp'`	`'on'`	`glmfit` estimates a dispersion parameter for the binomial or Poisson distribution.
`'estdisp'`	`'off'` (Default for binomial or Poisson distribution)	`glmfit` uses the theoretical value of 1.0 for those distributions.
`'offset'`	Vector	`glmfit` uses `offset` as an additional predictor variable, but with a coefficient value fixed at 1.0.
`'weights'`	Vector of prior weights, such as the inverses of the relative variance of each observation
`'constant'`	`'on'` (default)	`glmfit` includes a constant term in the model and returns a (p + 1)-by-1 vector of coefficient estimates `b`. The coefficient of the constant term is the first element of `b`.
`'constant'`	`'off'`	`glmfit` omits the constant term and returns a p-by-1 vector of coefficient estimates `b`.

[b,dev] = glmfit(...) returns dev, the deviance of the fit at the solution vector. The deviance is a generalization of the residual sum of squares. It is possible to perform an analysis of deviance to compare several models, each a subset of the other, and to test whether the model with more terms is significantly better than the model with fewer terms.

[b,dev,stats] = glmfit(...) returns dev and stats.

stats is a structure with the following fields:

beta — Coefficient estimates b
dfe — Degrees of freedom for error
sfit — Estimated dispersion parameter
s — Theoretical or estimated dispersion parameter
estdisp — 0 when the 'estdisp' name-value pair argument value is 'off' and 1 when the 'estdisp' name-value pair argument value is 'on'.
covb — Estimated covariance matrix for B
se — Vector of standard errors of the coefficient estimates b
coeffcorr — Correlation matrix for b
t — t statistics for b
p — p-values for b
resid — Vector of residuals
residp — Vector of Pearson residuals
residd — Vector of deviance residuals
resida — Vector of Anscombe residuals

If you estimate a dispersion parameter for the binomial or Poisson distribution, then stats.s is set equal to stats.sfit. Also, the elements of stats.se differ by the factor stats.s from their theoretical values.

Examples

collapse all

Fit Generalized Linear Model with Probit Link

Open Live Script

Enter the sample data.

x = [2100 2300 2500 2700 2900 3100 ...
     3300 3500 3700 3900 4100 4300]';
n = [48 42 31 34 31 21 23 23 21 16 17 21]';
y = [1 2 0 3 8 8 14 17 19 15 17 21]';

Each y value is the number of successes in the corresponding number of trials in n, and x contains the predictor variable values.

Fit a probit regression model for y on x.

b = glmfit(x,[y n],'binomial','link','probit');

Compute the estimated number of successes. Plot the percent observed and estimated percent success versus the x values.

yfit = glmval(b,x,'probit','size',n);
plot(x, y./n,'o',x,yfit./n,'-','LineWidth',2)

Use Custom-Defined Link Function

Open Live Script

Load the sample data.

load fisheriris

The column vector, species, consists of iris flowers of three different species, setosa, versicolor, virginica. The double matrix meas consists of four types of measurements on the flowers, the length and width of sepals and petals in centimeters, respectively.

Define the response and predictor variables.

X = meas(51:end,:);
y = strcmp('versicolor',species(51:end));

Define three function handles, created using @, that define the link, the derivative of the link, and the inverse link for a logit link function. Store them in a cell array.

link = @(mu) log(mu ./ (1-mu));
derlink = @(mu) 1 ./ (mu .* (1-mu));
invlink = @(resp) 1 ./ (1 + exp(-resp));
F = {link, derlink, invlink};

Fit a logistic regression using glmfit with the link function that you defined.

b = glmfit(X,y,'binomial','link',F)

Fit a generalized linear model by using the logit link function and compare the results.

b = glmfit(X,y,'binomial','link','logit')

References

[1] Dobson, A. J. An Introduction to Generalized Linear Models. New York: Chapman & Hall, 1990.

[2] McCullagh, P., and J. A. Nelder. Generalized Linear Models. New York: Chapman & Hall, 1990.

[3] Collett, D. Modeling Binary Data. New York: Chapman & Hall, 2002.

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Documentation

glmfit

Syntax

Description

Examples

Fit Generalized Linear Model with Probit Link

Use Custom-Defined Link Function

References

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

See Also

Statistics and Machine Learning Toolbox Documentation

Support

Documentation

glmfit

Syntax

Description

Examples

Fit Generalized Linear Model with Probit Link

Use Custom-Defined Link Function

References

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

See Also

Statistics and Machine Learning Toolbox Documentation

Support

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.