glmfit

Generalized linear model regression

Syntax

b = glmfit(X,y,distr)
b = glmfit(X,y,distr,param1,val1,param2,val2,...)
[b,dev] = glmfit(...)
[b,dev,stats] = glmfit(...)

Description

b = glmfit(X,y,distr) returns a (p + 1)-by-1 vector b of coefficient estimates for a generalized linear regression of the responses in y on the predictors in X, using the distribution distr. X is an n-by-p matrix of p predictors at each of n observations. distr can be any of the following: 'binomial', 'gamma', 'inverse gaussian', 'normal' (the default), and 'poisson'.

In most cases, y is an n-by-1 vector of observed responses. For the binomial distribution, y can be a binary vector indicating success or failure at each observation, or a two column matrix with the first column indicating the number of successes for each observation and the second column indicating the number of trials for each observation.

This syntax uses the canonical link (see below) to relate the distribution to the predictors.

Note

By default, glmfit adds a first column of 1s to X, corresponding to a constant term in the model. Do not enter a column of 1s directly into X. You can change the default behavior of glmfit using the 'constant' parameter, below.

glmfit treats NaNs in either X or y as missing values, and ignores them.

b = glmfit(X,y,distr,param1,val1,param2,val2,...) additionally allows you to specify optional parameter name/value pairs to control the model fit. Acceptable parameters are as follows.

ParameterValueDescription
'link'

'identity', default for the distribution 'normal'

µ = Xb

'log', default for the distribution 'poisson'

log(µ) = Xb

'logit', default for the distribution 'binomial'

log(µ/(1 – µ)) = Xb

'probit'

norminv(µ) = Xb

'comploglog'

log( -log(1 – µ)) = Xb

'reciprocal', default for the distribution 'gamma'

1/µ = Xb

'loglog'

log( -log(µ)) = Xb

p (a number), default for the distribution 'inverse gaussian' (with p = -2)

µp = Xb

cell array of the form {FL FD FI}, containing three function handles, created using @, that define the link (FL), the derivative of the link (FD), and the inverse link (FI).

Custom-defined link function. You must provide

  • FL(mu)

  • FD = dFL(mu)/dmu

  • FI = FL^(-1)

structure array having these fields:

  • 'Link' — Link function

  • 'Derivative' — Derivative of the link function

  • 'Inverse' — Inverse of the link function

The value of each field is a character vector corresponding to a function that is on the path or a function handle (created using @).

Custom-defined link function, its derivative, and its inverse.
'estdisp''on'

glmfit estimates a dispersion parameter for the binomial or Poisson distribution.

'off' (Default for binomial or Poisson distribution)

glmfit uses the theoretical value of 1.0 for those distributions.

'offset'

Vector

glmfit uses offset as an additional predictor variable, but with a coefficient value fixed at 1.0.

'weights'

Vector of prior weights, such as the inverses of the relative variance of each observation

 
'constant'

'on' (default)

glmfit includes a constant term in the model and returns a (p + 1)-by-1 vector of coefficient estimates b. The coefficient of the constant term is the first element of b.

'off'

glmfit omits the constant term and returns a p-by-1 vector of coefficient estimates b.

[b,dev] = glmfit(...) returns dev, the deviance of the fit at the solution vector. The deviance is a generalization of the residual sum of squares. It is possible to perform an analysis of deviance to compare several models, each a subset of the other, and to test whether the model with more terms is significantly better than the model with fewer terms.

[b,dev,stats] = glmfit(...) returns dev and stats.

stats is a structure with the following fields:

  • beta — Coefficient estimates b

  • dfe — Degrees of freedom for error

  • sfit — Estimated dispersion parameter

  • s — Theoretical or estimated dispersion parameter

  • estdisp — 0 when the 'estdisp' name-value pair argument value is 'off' and 1 when the 'estdisp' name-value pair argument value is 'on'.

  • covb — Estimated covariance matrix for B

  • se — Vector of standard errors of the coefficient estimates b

  • coeffcorr — Correlation matrix for b

  • tt statistics for b

  • pp-values for b

  • resid — Vector of residuals

  • residp — Vector of Pearson residuals

  • residd — Vector of deviance residuals

  • resida — Vector of Anscombe residuals

If you estimate a dispersion parameter for the binomial or Poisson distribution, then stats.s is set equal to stats.sfit. Also, the elements of stats.se differ by the factor stats.s from their theoretical values.

Examples

collapse all

Enter the sample data.

x = [2100 2300 2500 2700 2900 3100 ...
     3300 3500 3700 3900 4100 4300]';
n = [48 42 31 34 31 21 23 23 21 16 17 21]';
y = [1 2 0 3 8 8 14 17 19 15 17 21]';

Each y value is the number of successes in the corresponding number of trials in n, and x contains the predictor variable values.

Fit a probit regression model for y on x.

b = glmfit(x,[y n],'binomial','link','probit');

Compute the estimated number of successes. Plot the percent observed and estimated percent success versus the x values.

yfit = glmval(b,x,'probit','size',n);
plot(x, y./n,'o',x,yfit./n,'-','LineWidth',2)

Load the sample data.

load fisheriris

The column vector, species, consists of iris flowers of three different species, setosa, versicolor, virginica. The double matrix meas consists of four types of measurements on the flowers, the length and width of sepals and petals in centimeters, respectively.

Define the response and predictor variables.

X = meas(51:end,:);
y = strcmp('versicolor',species(51:end));

Define three function handles, created using @, that define the link, the derivative of the link, and the inverse link for a logit link function. Store them in a cell array.

link = @(mu) log(mu ./ (1-mu));
derlink = @(mu) 1 ./ (mu .* (1-mu));
invlink = @(resp) 1 ./ (1 + exp(-resp));
F = {link, derlink, invlink};

Fit a logistic regression using glmfit with the link function that you defined.

b = glmfit(X,y,'binomial','link',F)
b = 5×1

   42.6378
    2.4652
    6.6809
   -9.4294
  -18.2861

Fit a generalized linear model by using the logit link function and compare the results.

b = glmfit(X,y,'binomial','link','logit')
b = 5×1

   42.6378
    2.4652
    6.6809
   -9.4294
  -18.2861

References

[1] Dobson, A. J. An Introduction to Generalized Linear Models. New York: Chapman & Hall, 1990.

[2] McCullagh, P., and J. A. Nelder. Generalized Linear Models. New York: Chapman & Hall, 1990.

[3] Collett, D. Modeling Binary Data. New York: Chapman & Hall, 2002.

Extended Capabilities

Introduced before R2006a