To fit a linear-mixed effects model, you must store your data in a table or dataset array. In
your table or dataset array, you must have a column for each variable including the
response variable. More specifically, the table or dataset array, say
tbl
, must contain the following:
A response variable y
Predictive variables
Xj
which
can be continuous or grouping variables
Grouping variables g1
,
g2
, ...,
gR
,
where the grouping variables in
Xj
and
gr
can be
categorical, logical, a character array, a string array, or a cell array of
character vectors, r = 1, 2, ..., R.
You must organize your data so that each row represents an observation. And each row should contain the value of variables and the levels of grouping variables corresponding to that observation. For example, if you have data from an experiment with four treatment options, on five different types of individuals chosen randomly from a population of individuals (blocks), the table or dataset array must look like this.
Block | Treatment | Response |
---|---|---|
1 | 1 | y11 |
1 | 2 | y12 |
1 | 3 | y13 |
1 | 4 | y14 |
... | ... | ... |
5 | 1 | y51 |
5 | 2 | y52 |
5 | 3 | y53 |
5 | 4 | y54 |
Now, consider a split-plot experiment, where the effect of four different types of fertilizers on the yield of tomato plants is studied. The soil where the tomato plants are planted is divided into three blocks based on the soil type: sandy, silty, and loamy. Each block is divided into five plots, where five types of tomato plants, (cherry, heirloom, grape, vine, and plum) are randomly assigned to these plots. Then, the tomato plants in the plots are divided into subplots, where each subplot is treated by one of the four fertilizers. The data from this experiment looks like:
Soil | Tomato | Fertilizer | Yield |
---|---|---|---|
'Sandy' | 'Plum' | 1 | 104 |
'Sandy' | 'Plum' | 2 | 136 |
'Sandy' | 'Plum' | 3 | 158 |
'Sandy' | 'Plum' | 4 | 174 |
'Sandy' | 'Cherry' | 1 | 57 |
'Sandy' | 'Cherry' | 2 | 86 |
... | ... | ... | ... |
'Sandy' | 'Vine' | 3 | 99 |
'Sandy' | 'Vine' | 4 | 117 |
'Silty' | 'Plum' | 1 | 120 |
'Silty' | 'Plum' | 2 | 115 |
... | ... | ... | ... |
'Loamy' | 'Vine' | 3 | 111 |
'Loamy' | 'Vine' | 4 | 105 |
You must specify the model you want to fit using the formula
input
argument to fitlme
.
In general, a formula for model specification is a character vector or string scalar of the
form 'y ~ terms'
. For linear mixed-effects models, this formula
is in the form 'y ~ fixed + (random1|grouping1) + ... +
(randomR|groupingR)'
, where fixed
contains the
fixed-effects terms and random1, ..., randomR
contain the
random-effects terms. For example, for the previous fertilizer experiment, consider
the following mixed-effects model
where i = 1, 2, ..., 60, the index m corresponds to the fertilizer types, j corresponds to the tomato types, and k = 1, 2, 3 corresponds to the blocks (soil). Sk represents the kth soil type, and I[F]im is the dummy variable representing level m of the fertilizer. Similarly, I[T]ij is the dummy variable representing the level j of the tomato type.
You can fit this model using the formula 'Yield ~ 1
+ Fertilizer + Tomato + (1|Soil)+(1|Soil:Tomato)'
.
For detailed information on how to specify your model using formula, see Relationship Between Formula and Design Matrices.
If you cannot easily describe your model using a formula, you
can create design matrices to define the fixed and random effects,
and fit the model using fitlmematrix(X,y,Z,G)
.
You must create your design matrices as follows.
Fixed-effects and random-effects design matrices X
and Z
:
Enter a column of 1s for the intercept using ones(n,1)
,
where n is the total number of observations.
If X1
is a continuous variable,
then enter X1
as it is in a separate column.
If X1
is a categorical variable
with m levels, then there must be m –
1 dummy variables for m – 1 levels of X1
in X
.
For example, consider an experiment where you want to study the impact of quality of raw materials from four different providers on the productivity of a production line. If you fit a linear mixed-effects model with intercept and provider as the fixed-effects terms, intercept is the random-effects term, and you use reference contrasts coding, then you must construct your fixed- and random-effects design matrices as follows.
D = dummyvar(provider); % Create dummy variables
X = [ones(n,1) D(:,2) D(:,3) D(:,4)];
Z = [ones(n,1)];
Because reference contrast coding uses the first provider as the reference, and the model has an intercept, you must use the dummy variables for only the last three providers.
If there is an interaction term of predictor variables X1
and X2
,
then you must enter a column that you form by elementwise product
of the vectors X1
and X2
.
For example, if you want to fit a model, where there is an intercept, a continuous treatment factor, a continuous time factor, and their interaction as the fixed-effects in a longitudinal study, and time is the random-effects term, then your fixed- and random-effects design matrices should look like
X = [ones(n,1),treatment,time,treatment.*time]; y = response; Z = [time];
Grouping variables G
:
There is one column for each grouping variable and a column of elementwise product of the grouping variables in case of a nesting.
For example, if you want to group plots (plot
)
within blocks (block
), then you must add a column
of elementwise product of plot
by block
.
More specifically, if you want to fit a model where there is intercept
and a continuous treatment factor as the fixed-effects in a split-block
experiment, and the intercept and treatment are grouped by the plots
nested within blocks, then the design matrices should look like this.
X = [ones(n,1),treatment]; y = response; Z = [ones(n,1),treatment]; G = [block.*plot];
Suppose in the earlier quality of raw materials example, the raw materials arrive in bulks, and the bulks are nested within providers. If you want to fit a linear mixed-effects model, where intercept is grouped by the bulks within providers, then your design matrices should look like this.
D = dummyvar(provider); X = [ones(n,1) D(:,2) D(:,3) D(:,4)]; y = response; Z = ones(n,1); G = [provider.*bulks];
In the earlier longitudinal study example, if you want to add random effects for intercept and time grouped by subjects that participated in the study, then your design matrices should look like
X = [ones(n,1),treatment,time, treatment.*time]; y = response; Z = [ones(n,1),time]; G = subject;
fitlme(tbl,formula)
and fitlmematrix(X,y,Z,G)
are
equivalent in functionality, such that
y
is the n-by-1
response vector.
X
is an n-by-p fixed-effects
design matrix. fitlme
constructs this from the
expression fixed
in formula
.
Z
is an R-by-1
cell array with Z{r}
being an n-by-q(r)
random-effects design matrix constructed from the rth
expression in random
in formula
,
r = 1, 2, ..., R.
G
is an R-by-1
cell array with G{r}
being an n-by-1
grouping variable, g
r,
in formula
with M(r)
levels or groups.
For example, if tbl
is a table or dataset
array containing the response variable y
, the continuous
variables X1
and X2
, and the
grouping variable g
, then to fit a linear mixed-effects
model that corresponds to the formula expression 'y ~ X1+
X2+ (X1*X2|g)'
using fitlmematrix(X,y,Z,G)
the
input arguments must correspond to the following:
y = tbl.y X = [ones(n,1), tbl.X1, tbl.X2] Z = [ones(n,1), tbl.X1, tbl.X2, tbl.X1.*tbl.X2] G = tbl.g
fitlme
| fitlmematrix
| LinearMixedModel