Fit linear regression model using stepwise regression
returns a vector b
= stepwisefit(X
,y
)b
of coefficient estimates from stepwise
regression of the response vector y
on the predictor variables in
matrix X
. stepwisefit
begins with an initial
constant model and takes forward or backward steps to add or remove variables, until
a stopping criterion is satisfied.
specifies additional options using one or more name-value pair arguments. For
example, you can specify a nonconstant initial model, or a maximum number of steps
that b
= stepwisefit(X
,y
,Name,Value
)stepwisefit
can take.
[
also returns a specification of the variables in the final regression model
b
,se
,pval
,finalmodel
,stats
] = stepwisefit(___)finalmodel
, and statistics stats
about
the final model.
Stepwise regression is a method for adding terms to and removing terms from a multilinear model based on their statistical significance. This method begins with an initial model and then takes successive steps to modify the model by adding or removing terms. At each step, the p-value of an F-statistic is computed to test models with and without a potential term. If a term is not currently in the model, the null hypothesis is that the term would have a zero coefficient if added to the model. If there is sufficient evidence to reject the null hypothesis, the term is added to the model. Conversely, if a term is currently in the model, the null hypothesis is that the term has a zero coefficient. If there is insufficient evidence to reject the null hypothesis, the term is removed from the model. The method proceeds as follows:
Fit the initial model.
If any terms not in the model have p-values less than an
entry tolerance, add the one with the smallest p-value and
repeat this step. For example, assume the initial model is the default constant
model and the entry tolerance is the default 0.05
. The
algorithm first fits all models consisting of the constant plus another term and
identifies the term that has the smallest p-value, for
example term 4
. If the term 4
p-value is less than 0.05
, then term
4
is added to the model. Next, the algorithm performs a
search among all models consisting of the constant, term 4
,
and another term. If a term not in the model has a p-value
less than 0.05
, the term with the smallest
p-value is added to the model and the process is
repeated. When no further terms exist that can be added to the model, the
algorithm proceeds to step 3.
If any terms in the model have p-values greater than an exit tolerance, remove the one with the largest p-value and go to step 2; otherwise, end.
In each step of the algorithm, stepwisefit
uses the method of least
squares to estimate the model coefficients. After adding a term to the model at an
earlier stage, the algorithm might subsequently drop that term if it is no longer
helpful in combination with other terms added later. The method terminates when no
single step improves the model. However, the final model is not guaranteed to be
optimal, which means having the best fit to the data. A different initial model or a
different sequence of steps might lead to a better fit. In this sense, stepwise models
are locally optimal, but are not necessarily globally optimal.
You can create a model using fitlm
, and then manually adjust the model using
step
, addTerms
, and removeTerms
.
Use stepwiselm
if you have data in a table, you have a mix
of continuous and categorical predictors, or you want to specify model formulas
that can potentially include higher-order and interaction terms.
Use stepwiseglm
to create stepwise generalized linear models
(for example, if you have a binary response variable and want to fit a
classification model).
[1] Draper, Norman R., and Harry Smith. Applied Regression Analysis. Hoboken, NJ: Wiley-Interscience, 1998. pp. 307–312.
addedvarplot
| regress
| stepwise
| stepwiseglm
| stepwiselm