Generalized linear mixed-effects model class
A GeneralizedLinearMixedModel
object represents a regression model
of a response variable that contains both fixed and random effects. The object comprises
data, a model description, fitted coefficients, covariance parameters, design matrices,
residuals, residual plots, and other diagnostic information for a generalized linear
mixed-effects (GLME) model. You can predict model responses with the
predict
function and generate random data at new design points
using the random
function.
You can fit a generalized linear mixed-effects (GLME) model to sample data using
fitglme(
. For
more information, see tbl
,formula
)fitglme
.
tbl
— Input dataInput data, which includes the response variable, predictor variables,
and grouping variables, specified as a table or dataset array. The
predictor variables can be continuous or grouping variables (see Grouping Variables). You must specify
the model for the variables using formula
.
Data Types: table
formula
— Formula for model specification'y ~ fixed +
(random1|grouping1) + ... + (randomR|groupingR)'
Formula for model specification, specified as a character vector or
string scalar of the form 'y ~ fixed + (random1|grouping1) +
... + (randomR|groupingR)'
. For a full description, see
Formula.
Example: 'y ~ treatment +(1|block)'
Coefficients
— Estimates of fixed-effects coefficientsEstimates of fixed-effects coefficients and related statistics, stored as a dataset array that has one row for each coefficient and the following columns:
Name
— Name of the coefficient
Estimate
— Estimated coefficient
value
SE
— Standard error of the
estimate
tStat
— t-statistic
for a test that the coefficient is equal to 0
DF
— Degrees of freedom associated with
the t statistic
pValue
— p-value for
the t-statistic
Lower
— Lower confidence limit
Upper
— Upper confidence limit
To obtain any of these columns as a vector, index into the property using dot notation.
Use the coefTest
method to perform other
tests on the coefficients.
CoefficientCovariance
— Covariance of estimated fixed-effects vectorCovariance of estimated fixed-effects vector, stored as a matrix.
Data Types: single
| double
CoefficientNames
— Names of fixed-effects coefficientsNames of fixed-effects coefficients, stored as a cell array of character
vectors. The label for the coefficient of the constant term is
(Intercept)
. The labels for other coefficients
indicate the terms that they multiply. When the term includes a categorical
predictor, the label also indicates the level of that predictor.
Data Types: cell
DFE
— Degrees of freedom for errorDegrees of freedom for error, stored as a positive integer value.
DFE
is the number of observations minus the number of
estimated coefficients.
DFE
contains the degrees of freedom corresponding to
the 'Residual'
method of calculating denominator degrees
of freedom for hypothesis tests on fixed-effects coefficients. If
n is the number of observations and
p is the number of fixed-effects coefficients, then
DFE
is equal to n – p.
Data Types: double
Dispersion
— Model dispersion parameterModel dispersion parameter, stored as a scalar value. The dispersion parameter defines the conditional variance of the response.
For observation i, the conditional variance of the response yi, given the conditional mean μi and the dispersion parameter σ2, in a generalized linear mixed-effects model is
where wi is the ith observation weight and
v is the variance function for the specified
conditional distribution of the response. The Dispersion
property contains an estimate of σ2 for the specified GLME model. The value of
Dispersion
depends on the specified conditional
distribution of the response. For binomial and Poisson distributions, the
theoretical value of Dispersion
is equal to σ2 =
1.0.
If FitMethod
is MPL
or
REMPL
and the
'DispersionFlag'
name-value pair argument in
fitglme
is
true
, then a dispersion parameter is
estimated from data for all distributions, including binomial and
Poisson distributions.
If FitMethod
is
ApproximateLaplace
or
Laplace
, then the
'DispersionFlag'
name-value pair argument in
fitglme
does not apply,
and the dispersion parameter is fixed at 1.0 for binomial and
Poisson distributions. For all other distributions,
Dispersion
is estimated from data.
Data Types: double
DispersionEstimated
— Flag indicating if dispersion parameter was estimatedtrue
| false
Flag indicating estimated dispersion parameter, stored as a logical value.
If FitMethod
is
ApproximateLaplace
or
Laplace
, then the dispersion parameter is
fixed at its theoretical value of 1.0 for binomial and Poisson
distributions, and DispersionEstimated
is
false
. For other distributions, the
dispersion parameter is estimated from the data, and
DispersionEstimated
is
true
.
If FitMethod
is MPL
or
REMPL
, and the
'DispersionFlag'
name-value pair argument in
fitglme
is specified as
true
, then the dispersion parameter is
estimated for all distributions, including binomial and Poisson
distributions, and DispersionEstimated
is
true
.
If FitMethod
is MPL
or
REMPL
, and the
'DispersionFlag'
name-value pair argument in
fitglme
is specified as
false
, then the dispersion parameter is fixed
at its theoretical value for binomial and Poisson distributions, and
DispersionEstimated
is
false
. For distributions other than binomial
and Poisson, the dispersion parameter is estimated from the data,
and DispersionEstimated
is
true
.
Data Types: logical
Distribution
— Response distribution name'Normal'
| 'Binomial'
| 'Poisson'
| 'Gamma'
| 'InverseGaussian'
Response distribution name, stored as one of the following:
'Normal'
— Normal distribution
'Binomial'
— Binomial
distribution
'Poisson'
— Poisson distribution
'Gamma'
— Gamma distribution
'InverseGaussian'
— Inverse Gaussian
distribution
FitMethod
— Method used to fit the model'MPL'
| 'REMPL'
| 'ApproximateLaplace'
| 'Laplace'
Method used to fit the model, stored as one of the following.
'MPL'
— Maximum pseudo likelihood
'REMPL'
— Restricted maximum pseudo
likelihood
'ApproximateLaplace'
— Maximum
likelihood using the approximate Laplace method, with fixed effects
profiled out
'Laplace'
— Maximum likelihood using the
Laplace method
Formula
— Model specification formulaModel specification formula, stored as an object. The model specification formula uses Wilkinson’s notation to describe the relationship between the fixed-effects terms, random-effects terms, and grouping variables in the GLME model. For more information see Formula.
Link
— Link function characteristicsLink function characteristics, stored as a structure containing the
following fields. The link is a function G
that links the
distribution parameter MU
to the linear predictor
ETA
as follows: G(MU) =
ETA
.
Field | Description |
---|---|
Name | Name of the link function |
Link | Function that defines G |
Derivative | Derivative of G |
SecondDerivative | Second derivative of G |
Inverse | Inverse of G |
Data Types: struct
LogLikelihood
— Log of likelihood functionLog of likelihood function evaluated at the estimated coefficient values,
stored as a scalar value. LogLikelihood
depends on the
method used to fit the model.
If you use 'Laplace'
or
'ApproximateLaplace'
, then
LogLikelihood
is the maximized log
likelihood.
If you use 'MPL'
, then
LogLikelihood
is the maximized log likelihood
of the pseudo data from the final pseudo likelihood
iteration.
If you use 'REMPL'
, then
LogLikelihood
is the maximized restricted log
likelihood of the pseudo data from the final pseudo likelihood
iteration.
Data Types: double
ModelCriterion
— Model criterionModel criterion to compare fitted generalized linear mixed-effects models, stored as a table with the following fields.
Field | Description |
---|---|
AIC | Akaike information criterion |
BIC | Bayesian information criterion |
LogLikelihood |
|
Deviance | –2 times LogLikelihood |
NumCoefficients
— Number of fixed-effects coefficientsNumber of fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.
Data Types: double
NumEstimatedCoefficients
— Number of estimated fixed-effects coefficientsNumber of estimated fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.
Data Types: double
NumObservations
— Number of observationsNumber of observations used in the fit, stored as a positive integer
value. NumObservations
is the number of rows in the table
or dataset array tbl
, minus rows excluded using the
'Exclude'
name-value pair of fitglme
or rows containing
NaN
values.
Data Types: double
NumPredictors
— Number of predictorsNumber of variables used as predictors in the generalized linear mixed-effects model, stored as a positive integer value.
Data Types: double
NumVariables
— Total number of variablesTotal number of variables, including the response and predictors, stored
as a positive integer value. If the sample data is in a table or dataset
array tbl
, then NumVariables
is the
total number of variables in tbl
, including the response
variable. NumVariables
includes variables, if any, that
are not used as predictors or as the response.
Data Types: double
ObservationInfo
— Information about the observationsInformation about the observations used in the fit, stored as a table.
ObservationInfo
has one row for each observation and
the following columns.
Name | Description |
---|---|
Weights | The weight value for the observation. The default value is 1. |
Excluded | If the observation was excluded from the fit using
the 'Exclude' name-value pair
argument in fitglme , then
Excluded is
true , or 1 .
Otherwise, Excluded is
false , or
0 . |
Missing | If the observation was excluded from the fit
because any response or predictor value is missing,
then Missing
values include |
Subset | If the observation was used in the fit, then
Subset is
true . If the observation was not used
in the fit because it is missing or excluded, then
Subset is
false . |
BinomSize | Binomial size for each observation. This column only applies when fitting a binomial distribution. |
Data Types: table
ObservationNames
— Names of observationsNames of observations used in the fit, stored as a cell array of character vectors.
If the data is in a table or dataset array tbl
that contains observation names, then
ObservationNames
uses those names.
If the data is provided in matrices, or in a table or dataset
array without observation names, then
ObservationNames
is an empty cell
array.
Data Types: cell
PredictorNames
— Names of predictorsNames of the variables used as predictors in the fit, stored as a cell
array of character vectors that has the same length as
NumPredictors
.
Data Types: cell
ResponseName
— Name of response variableName of the variable used as the response variable in the fit, stored as a character vector.
Data Types: char
Rsquared
— Proportion of variability in the response explained by the fitted modelProportion of variability in the response explained by the fitted model,
stored as a structure. Rsquared
contains the
R-squared value of the fitted model, also known as
the multiple correlation coefficient. Rsquared
contains
the following fields.
Field | Description |
---|---|
Ordinary | R-squared value, stored as a scalar value in a
structure.Rsquared.Ordinary =
1 — SSE./SST |
Adjusted | R-squared value adjusted for the number of fixed-effects
coefficients, stored as a scalar value in a
structure.Rsquared.Adjusted =
1 —
(SSE./SST)*(DFT./DFE) ,where DFE = n – p , DFT = n –
1 , n is the total number of
observations, and p is the number of
fixed-effects coefficients. |
Data Types: struct
SSE
— Error sum of squaresError sum of squares, stored as a positive scalar value.
SSE
is the weighted sum of the squared conditional
residuals, and is calculated as
where n is the number of observations, wieff is the ith effective weight, yi is the ith response, and fi is the ith fitted value.
The ith effective weight is calculated as
where vi is the variance term for the ith observation, and are estimated values of β and b, respectively.
The ith fitted value is calculated as
where xiT is the ith row of the fixed-effects design matrix X, and ziT is the ith row of the random-effects design matrix Z. δi is the ith offset value.
Data Types: double
SSR
— Regression sum of squaresRegression sum of squares, stored as a positive scalar value.
SSR
is the sum of squares explained by the
generalized linear mixed-effects regression, or equivalently the weighted
sum of the squared deviations of the conditional fitted values from their
weighted mean. SSR
is calculated as
where n is the number of observations, wieff is the ith effective weight, fi is the ith fitted value, and is a weighted average of the fitted values.
The ith effective weight is calculated as
where and are estimated values of β and b, respectively.
The ith fitted value is calculated as
where xiT is the ith row of the fixed-effects design matrix X, and ziT is the ith row of the random-effects design matrix Z. δi is the ith offset value.
The weighted average of fitted values is calculated as
Data Types: double
SST
— Total sum of squaresTotal sum of squares, stored as a positive scalar value. For a GLME model,
SST
is defined as SST = SSE +
SSR
.
Data Types: double
VariableInfo
— Information about the variablesInformation about the variables used in the fit, stored as a table.
VariableInfo
has one row for each variable and
contains the following columns.
Column Name | Description |
---|---|
Class | Class of the variable ('double' ,
'cell' , 'nominal' ,
and so on). |
Range | Value range of the variable.
|
InModel | If the variable is a predictor in the fitted model,
If the variable
is not in the fitted model, |
IsCategorical | If the variable type is treated as a categorical
predictor (such as cell, logical, or categorical), then
If the variable
is a continuous predictor, then
|
Data Types: table
VariableNames
— Names of the variablesNames of all the variables contained in the table or dataset array
tbl
, stored as a cell array of character
vectors.
Data Types: cell
Variables
— VariablesVariables, stored as a table. If the fit is based on a table or dataset
array tbl
, then Variables
is identical
to tbl
.
Data Types: table
anova | Analysis of variance for generalized linear mixed-effects model |
coefCI | Confidence intervals for coefficients of generalized linear mixed-effects model |
coefTest | Hypothesis test on fixed and random effects of generalized linear mixed-effects model |
compare | Compare generalized linear mixed-effects models |
covarianceParameters | Extract covariance parameters of generalized linear mixed-effects model |
designMatrix | Fixed- and random-effects design matrices |
fitted | Fitted responses from generalized linear mixed-effects model |
fixedEffects | Estimates of fixed effects and related statistics |
partialDependence | Compute partial dependence |
plotPartialDependence | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |
plotResiduals | Plot residuals of generalized linear mixed-effects model |
predict | Predict response of generalized linear mixed-effects model |
random | Generate random responses from fitted generalized linear mixed-effects model |
randomEffects | Estimates of random effects and related statistics |
refit | Refit generalized linear mixed-effects model |
residuals | Residuals of fitted generalized linear mixed-effects model |
response | Response vector of generalized linear mixed-effects model |
Load the sample data.
load mfr
This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:
Flag to indicate whether the batch used the new process (newprocess
)
Processing time for each batch, in hours (time
)
Temperature of the batch, in degrees Celsius (temp
)
Categorical variable indicating the supplier (A
, B
, or C
) of the chemical used in the batch (supplier
)
Number of defects in the batch (defects
)
The data also includes time_dev
and temp_dev
, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.
Fit a generalized linear mixed-effects model using newprocess
, time_dev
, temp_dev
, and supplier
as fixed-effects predictors. Include a random-effects term for intercept grouped by factory
, to account for quality differences that might exist due to factory-specific variations. The response variable defects
has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects'
, so the dummy variable coefficients sum to 0.
The number of defects can be modeled using a Poisson distribution
This corresponds to the generalized linear mixed-effects model
where
is the number of defects observed in the batch produced by factory during batch .
is the mean number of defects corresponding to factory (where ) during batch (where ).
, , and are the measurements for each variable that correspond to factory during batch . For example, indicates whether the batch produced by factory during batch used the new process.
and are dummy variables that use effects (sum-to-zero) coding to indicate whether company C
or B
, respectively, supplied the process chemicals for the batch produced by factory during batch .
is a random-effects intercept for each factory that accounts for factory-specific variation in quality.
glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ... 'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');
Display the model.
disp(glme)
Generalized linear mixed-effects model fit by ML Model information: Number of observations 100 Fixed effects coefficients 6 Random effects coefficients 20 Covariance parameters 1 Distribution Poisson Link Log FitMethod Laplace Formula: defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory) Model fit statistics: AIC BIC LogLikelihood Deviance 416.35 434.58 -201.17 402.35 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue {'(Intercept)'} 1.4689 0.15988 9.1875 94 9.8194e-15 {'newprocess' } -0.36766 0.17755 -2.0708 94 0.041122 {'time_dev' } -0.094521 0.82849 -0.11409 94 0.90941 {'temp_dev' } -0.28317 0.9617 -0.29444 94 0.76907 {'supplier_C' } -0.071868 0.078024 -0.9211 94 0.35936 {'supplier_B' } 0.071072 0.07739 0.91836 94 0.36078 Lower Upper 1.1515 1.7864 -0.72019 -0.015134 -1.7395 1.5505 -2.1926 1.6263 -0.22679 0.083051 -0.082588 0.22473 Random effects covariance parameters: Group: factory (20 Levels) Name1 Name2 Type Estimate {'(Intercept)'} {'(Intercept)'} {'std'} 0.31381 Group: Error Name Estimate {'sqrt(Dispersion)'} 1
The Model information
table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a Poisson
distribution, the link function is Log
, and the fit method is Laplace
.
Formula
indicates the model specification using Wilkinson’s notation.
The Model fit statistics
table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (AIC
), Bayesian information criterion (BIC
) values, log likelihood (LogLikelihood
), and deviance (Deviance
) values.
The Fixed effects coefficients
table indicates that fitglme
returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (Name
) contains the name of each fixed-effects coefficient, column 2 (Estimate
) contains its estimated value, and column 3 (SE
) contains the standard error of the coefficient. Column 4 (tStat
) contains the -statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (DF
) and column 6 (pValue
) contain the degrees of freedom and -value that correspond to the -statistic, respectively. The last two columns (Lower
and Upper
) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient.
Random effects covariance parameters
displays a table for each grouping variable (here, only factory
), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, std
indicates that fitglme
returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1.
The standard display generated by fitglme
does not provide confidence intervals for the random-effects parameters. To compute and display these values, use covarianceParameters
.
In general, a formula for model specification is a character
vector or string scalar of the form 'y ~ terms'
. For generalized
linear mixed-effects models, this formula is in the form 'y ~ fixed +
(random1|grouping1) + ... + (randomR|groupingR)'
, where
fixed
and random
contain the fixed-effects
and the random-effects terms, respectively, and R is the number
of grouping variables in the model.
Suppose a table tbl
contains the following:
A response variable, y
Predictor variables,
Xj
, which
can be continuous or grouping variables
Grouping variables, g1
,
g2
, ...,
gR
,
where the grouping variables in
Xj
and
gr
can be
categorical, logical, character arrays, string arrays, or cell arrays of character
vectors.
Then, in a formula of the form, 'y ~ fixed +
(random1|g1) + ... +
(randomR|gR)'
,
the term fixed
corresponds to a specification of the
fixed-effects design matrix X
,
random
1 is a specification of the
random-effects design matrix Z
1
corresponding to grouping variable g
1, and
similarly random
R is a
specification of the random-effects design matrix
Z
R
corresponding to grouping variable
g
R. You can
express the fixed
and random
terms using
Wilkinson notation.
Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors.
Wilkinson Notation | Factors in Standard Notation |
---|---|
1 | Constant (intercept) term |
X^k , where k is a positive
integer | X ,
X2 , ...,
Xk |
X1 + X2 | X1 , X2 |
X1*X2 | X1 , X2 , X1.*X2
(elementwise multiplication of X1 and X2) |
X1:X2 | X1.*X2 only |
- X2 | Do not include X2 |
X1*X2 + X3 | X1 , X2 ,
X3 , X1*X2 |
X1 + X2 + X3 + X1:X2 | X1 , X2 ,
X3 , X1*X2 |
X1*X2*X3 - X1:X2:X3 | X1 , X2 ,
X3 , X1*X2 ,
X1*X3 , X2*X3 |
X1*(X2 + X3) | X1 , X2 ,
X3 , X1*X2 ,
X1*X3 |
Statistics and Machine Learning Toolbox™ notation always includes a constant term unless you explicitly remove
the term using -1
. Here are some examples for linear
mixed-effects model specification.
Examples:
Formula | Description |
---|---|
'y ~ X1 + X2' | Fixed effects for the intercept, X1 and
X2 . This is equivalent to 'y ~ 1 +
X1 + X2' . |
'y ~ -1 + X1 + X2' | No intercept and fixed effects for X1 and
X2 . The implicit intercept term is suppressed
by including -1 . |
'y ~ 1 + (1 | g1)' | Fixed effects for the intercept plus random effect for the
intercept for each level of the grouping variable
g1 . |
'y ~ X1 + (1 | g1)' | Random intercept model with a fixed slope. |
'y ~ X1 + (X1 | g1)' | Random intercept and slope, with possible correlation between
them. This is equivalent to 'y ~ 1 + X1 + (1 +
X1|g1)' . |
'y ~ X1 + (1 | g1) + (-1 + X1 | g1)' | Independent random effects terms for intercept and slope. |
'y ~ 1 + (1 | g1) + (1 | g2) + (1 |
g1:g2)' | Random intercept model with independent main effects for
g1 and g2 , plus an
independent interaction effect. |
You have a modified version of this example. Do you want to open this example with your edits?