In general, a formula for model specification is a character vector or string
scalar of the form 'y ~ terms'
. For the linear mixed-effects models, this
formula is in the form 'y ~ fixed + (random1|grouping1) + ... +
(randomR|groupingR)'
, where fixed
and
random
contain the fixed-effects and the random-effects terms.
Suppose a table tbl
contains the following:
A response variable, y
Predictor variables, Xj
,
which can be continuous or grouping variables
Grouping variables, g1
, g2
,
..., gR
,
where the grouping variables in
Xj
and
gr
can be
categorical, logical, character arrays, string arrays, or cell arrays of character
vectors.
Then, in a formula of the form, 'y ~ fixed + (random1|g1)
+ ... + (randomR|gR)'
,
the term fixed
corresponds to a specification of
the fixed-effects design matrix X
, random
1 is
a specification of the random-effects design matrix Z
1 corresponding
to grouping variable g
1,
and similarly random
R is
a specification of the random-effects design matrix Z
R corresponding
to grouping variable g
R.
You can express the fixed
and random
terms
using Wilkinson notation.
Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors.
Wilkinson Notation | Factors in Standard Notation |
---|---|
1 | Constant (intercept) term |
X^k , where k is a positive
integer | X , X2 ,
..., Xk |
X1 + X2 | X1 , X2 |
X1*X2 | X1 , X2 , X1.*X2
(elementwise multiplication of X1 and X2) |
X1:X2 | X1.*X2 only |
- X2 | Do not include X2 |
X1*X2 + X3 | X1 , X2 , X3 , X1*X2 |
X1 + X2 + X3 + X1:X2 | X1 , X2 , X3 , X1*X2 |
X1*X2*X3 - X1:X2:X3 | X1 , X2 , X3 , X1*X2 , X1*X3 , X2*X3 |
X1*(X2 + X3) | X1 , X2 , X3 , X1*X2 , X1*X3 |
Statistics and Machine Learning Toolbox™ notation always includes a constant term
unless you explicitly remove the term using -1
.
Here are some examples for linear mixed-effects model specification.
Examples:
Formula | Description |
---|---|
'y ~ X1 + X2' | Fixed effects for the intercept, X1 and X2 .
This is equivalent to 'y ~ 1 + X1 + X2' . |
'y ~ -1 + X1 + X2' | No intercept and fixed effects for X1 and X2 .
The implicit intercept term is suppressed by including -1 . |
'y ~ 1 + (1 | g1)' | Fixed effects for the intercept plus random effect for the
intercept for each level of the grouping variable g1 . |
'y ~ X1 + (1 | g1)' | Random intercept model with a fixed slope. |
'y ~ X1 + (X1 | g1)' | Random intercept and slope, with possible correlation between
them. This is equivalent to 'y ~ 1 + X1 + (1 + X1|g1)' . |
'y ~ X1 + (1 | g1) + (-1 + X1 | g1)' | Independent random effects terms for intercept and slope. |
'y ~ 1 + (1 | g1) + (1 | g2) + (1 | g1:g2)' | Random intercept model with independent main effects for g1 and g2 ,
plus an independent interaction effect. |
fitlme
converts the expressions in the fixed
and random
parts
(not grouping variables) of a formula into design matrices as follows:
Each term in a formula adds one or more columns to the corresponding design matrix.
A term containing a single continuous variable adds one column to the design matrix.
A fixed term containing a categorical variable X
with k levels
adds (k – 1) dummy variables to the design
matrix.
For example, if the variable Supplier
represents
three different suppliers a manufacturer receives parts from, i.e.
a categorical variable with three levels, and out of six batches of
parts, the first two batches come from supplier 1 (level 1), the second
two batches come from supplier 2 (level 2), and the last two batches
come from supplier 3 (level 3), such as
Supplier = 1 1 2 2 3 3
Supplier
to
the formula as a fixed-effects or random-effects term adds the following
two dummy variables to the corresponding design matrix, using the 'reference'
contrast:0 0 0 0 1 0 1 0 0 1 0 1
'DummyVarCoding'
name-value pair
argument of fitlme
.If X1
and X2
are
continuous variables, the product term X1:X2
adds
one column obtained by elementwise multiplication of X1
and X2
to
the design matrix.
If X1
is continuous and X2
is
categorical with k levels, the product term X1:X2
multiplies
elementwise X1
with the (k –
1) dummy variables representing X2
, and adds these
(k – 1) columns to the design matrix.
For example, if Drug
is the amount of a drug
given to patients, a continuous treatment, and Time
is
three distinct points in time when the health measures are taken,
a categorical variable with three levels, and out of nine observations,
the first three are observed at time point 1, the second three are
observed at time point 2, and the last three are observed at time
point 3 so that
[Drug Time] = 0.1000 1.0000 0.2000 1.0000 0.5000 2.0000 0.6000 2.0000 0.3000 3.0000 0.8000 3.0000
Drug:Time
adds
the following two variables to the design matrix:0 0 0 0 0.5000 0 0.6000 0 0 0.3000 0 0.8000
If X1
and X2
are
categorical variables with k and m levels
respectively, the product term X1:X2
adds (k –
1)*(m – 1) dummy variables to the design
matrix formed by taking the elementwise product of each dummy variable
representing X1
with each dummy variable representing X2
.
For example, in an experiment to determine the impact of the
type of corn and the popping method on the yield, suppose there are
three types of Corn
and two types of Method
as
follows:
1 oil 1 oil 1 air 1 air 2 oil 2 oil 2 air 2 air 3 oil 3 oil 3 air 3 air
Corn:Method
adds
the following to the design matrix:0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0
The term X1*X2
adds the necessary
number of columns for X1
, X2
,
and X1:X2
to the design matrix.
The term X1^2
adds the necessary
number of columns for X1
and X1:X1
to
the design matrix.
The symbol 1
(one) in the formula
stands for a column of all 1s. By default a column of 1s is included
in the design matrix. To exclude a column of ones from the design
matrix, you must explicitly specify –1
as
a term in the expression.
fitlme
handles the grouping variables in
the (.|group)
part of a formula as follows:
If a grouping variable has k levels, then k dummy variables represent this grouping.
For example, suppose District
is a categorical
grouping variable with three levels, showing the three types of districts,
and out of six schools, the first two are in district 1, the second
two are in district 2, and the last two are in district 3, so that
District = 1 1 2 2 3 3
1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1
If X1
is a continuous random-effects
variable and X2
is a grouping variable with k levels,
then the random term (X1 – 1|X2)
multiplies
elementwise X1
with the k dummy
variables representing X2
and adds these k columns
to the random-effects design matrix.
For example, suppose Score
is a continuous
variable showing the scores of students from a math exam in a school,
and Class
is a categorical variable with three
levels, showing the three different classes in a school. Also, suppose
out of nine observations first three correspond to the scores of students
in the first class, the second three correspond to scores of students
in the second class, and the last three correspond to the scores of
students in the third class, such as
[Score Class] = 78.0000 1.0000 68.0000 1.0000 81.0000 2.0000 53.0000 2.0000 85.0000 3.0000 72.0000 3.0000
(Score
– 1|Class)
adds the following three columns to the
random-effects design matrix:78.0000 0 0 68.0000 0 0 0 81.0000 0 0 53.0000 0 0 0 85.0000 0 0 72.0000
If X1
is a continuous predictor
variable and X2
and X3
are grouping
variables with k and m levels
respectively, the term (X1|X2:X3)
represents this
grouping of X1
with k*m dummy
variables formed by taking the elementwise product of each dummy variable
representing X2
with each dummy variable representing X3
.
For example, suppose Treatment
is a continuous
predictor variable, and there are three levels of Block
and
two levels of Plot
nested within the block as follows:
0.1000 1 a 0.2000 1 b 0.5000 2 a 0.6000 2 b 0.3000 3 a 0.8000 3 b
Then, the random term (Treatment – 1|Block:Plot)
adds
the following to the random-effects design matrix:
0.1000 0 0 0 0 0 0 0.2000 0 0 0 0 0 0 0.5000 0 0 0 0 0 0 0.6000 0 0 0 0 0 0 0.3000 0 0 0 0 0 0 0.8000
fitlme
| fitlmematrix
| LinearMixedModel