collintest

Belsley collinearity diagnostics

Syntax

collintest(X)

collintest(X,Name,Value)

sValue = collintest(___)

[sValue,condIdx,VarDecomp]
= collintest(___)

collintest(ax,___)

[sValue,condIdx,VarDecomp,h]
 = collintest(___)

Description

example

collintest(X) displays Belsley collinearity diagnostics for assessing the strength and sources of collinearity among variables in the matrix or table X at the command line.

example

collintest(X,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, collintest(X,'plot','on') plots the results to a figure.

example

sValue = collintest(___) returns the singular values in decreasing order using any of the input argument combinations in the previous syntaxes.

[sValue,condIdx,VarDecomp] = collintest(___) additionally returns the condition indices and variance decomposition proportions.

collintest(ax,___) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes.

[sValue,condIdx,VarDecomp,h] = collintest(___) additionally returns handles to plotted graphics objects. Use elements of h to modify properties of the plot after you create it.

Examples

collapse all

Display Belsley Collinearity Diagnostics

Open Live Script

Display collinearity diagnostics for multiple time series.

Load data of Canadian inflation and interest rates.

load Data_Canada

Display the Belsley collinearity diagnostics, using all default options.

collintest(DataTable)

Variance Decomposition

 sValue  condIdx   INF_C   INF_G   INT_S   INT_M   INT_L 
---------------------------------------------------------
 2.1748    1      0.0012  0.0018  0.0003  0.0000  0.0001 
 0.4789   4.5413  0.0261  0.0806  0.0035  0.0006  0.0012 
 0.1602  13.5795  0.3386  0.3802  0.0811  0.0011  0.0137 
 0.1211  17.9617  0.6138  0.5276  0.1918  0.0004  0.0193 
 0.0248  87.8245  0.0202  0.0099  0.7233  0.9979  0.9658

Only the last row in the display has a condition index larger than the default tolerance, 30. In this row, the last three variables (in the last three columns) have variance-decomposition proportions exceeding the default tolerance, 0.5. This suggests that the variables INT_S, INT_M, and INT_L exhibit multicollinearity.

Plot Belsley Collinearity Diagnostics

Open Live Script

Plot collinearity diagnostics for multiple time series.

Load data of Canadian inflation and interest rates.

load Data_Canada

Plot the Belsley collinearity diagnostics using the plot option.

collintest(DataTable,'plot','on')

Variance Decomposition

 sValue  condIdx   INF_C   INF_G   INT_S   INT_M   INT_L 
---------------------------------------------------------
 2.1748    1      0.0012  0.0018  0.0003  0.0000  0.0001 
 0.4789   4.5413  0.0261  0.0806  0.0035  0.0006  0.0012 
 0.1602  13.5795  0.3386  0.3802  0.0811  0.0011  0.0137 
 0.1211  17.9617  0.6138  0.5276  0.1918  0.0004  0.0193 
 0.0248  87.8245  0.0202  0.0099  0.7233  0.9979  0.9658

The plot corresponds to the values in the last row of variance-decomposition proportions, which is the only one with a condition index larger than the default tolerance, 30. The last three variables in this row have variance-decomposition proportions exceeding the default tolerance, 0.5, indicated by red markers in the plot.

Return Belsley Collinearity Diagnostics

Open Live Script

Compute collinearity diagnostics for multiple time series and return the singular values, condition indices, and variance-decomposition proportions.

Load data of Canadian inflation and interest rates.

load Data_Canada

Compute the Belsley collinearity diagnostics. Turn off the results display using the display option.

[sv,conIdx,varDecomp] = collintest(DataTable,'display',...
    'off');

There is no display of the results.

Display the contents of varDecomp.

varDecomp

varDecomp = 5×5

    0.0012    0.0018    0.0003    0.0000    0.0001
    0.0261    0.0806    0.0035    0.0006    0.0012
    0.3386    0.3802    0.0811    0.0011    0.0137
    0.6138    0.5276    0.1918    0.0004    0.0193
    0.0202    0.0099    0.7233    0.9979    0.9658

The output argument varDecomp is a matrix of the variance-decomposition proportions. sv is a vector of singular values in descending order, and condIdx is a vector of condition indices in ascending order.

Input Arguments

collapse all

`X` — Input regression variables
numeric matrix | tabular array

Input regression variables, specified as a numObs-by-numVars numeric matrix or tabular array. Each column of X corresponds to a variable, and each row corresponds to an observation. For models with an intercept, X should contain a column of ones.

collintest scales the columns of X to unit length before processing. Data in X should not be centered.

If X is a tabular array, then the variables must be numeric.

Data Types: double | table

`ax` — Axes on which to plot
`Axes` object

Axes on which to plot, specified as an Axes object.

By default, collintest plots to the current axes (gca).

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'plot','on','tolIdx',35 displays a results plot with a tolerance index of 35

`'varNames'` — Variable names
string vector | cell vector of strings

Variable names used in displays and plots of the results, specified as the comma-separated pair consisting of 'varNames' and a cell vector of strings or string vector. varNames must have length numVars, and each cell corresponds to a variable name. If an intercept term is present, then varNames must include the intercept term (e.g., include the name 'Const'). The software truncates all variable names to the first five characters.

If X is a matrix, then the default value of varNames is the cell vector of strings {'var1','var2',...}.
If X is a tabular array, then the default value of varNames is the property X.Properties.VariableNames.

Example: 'varNames',{'Const','AGE','BBD'}

Data Types: cell | string

`'display'` — Display results indicator
`'on'` (default) | `'off'`

Display results indicator for whether or not to display results in the Command Window, specified as the comma-separated pair consisting of 'display' and one of 'on' or 'off'. If you specify the value 'on', then all outputs are displayed in tabular form.

Example: 'display','off'

Data Types: cell | string

`'plot'` — Plot results indicator
`'off'` (default) | `'on'`

Plot results indicator for whether or not to plot results, specified as the comma-separated pair consisting of 'plot' and one of 'on' or 'off'.

If you specify the value 'on', then the plot shows the critical rows of the output VarDecomp, that is, those rows with condition indices above the input tolerance tolIdx.
If a group of at least two variables in a critical row have variance-decomposition proportions above the input tolerance tolProp, then the group is identified with red markers.

Example: 'plot','on'

Data Types: cell | string

`'tolIdx'` — Condition index tolerance
`30` (default) | scalar value of at least 1

Condition index tolerance, specified as the comma-separated pair consisting of 'tolIdx' and a scalar value of at least one. collintest uses the tolerance to decide which indices are large enough to infer a near dependency in the data. The tolIdx value is only used when plot has the value 'on'.

Example: 'tolIdx',25

Data Types: double

`'tolProp'` — Variance-decomposition proportion tolerance
`0.5` (default) | scalar between 0 and 1

Variance-decomposition proportion tolerance, specified as the comma-separated pair consisting of 'tolProp' and a scalar value between zero and one. collintest uses the tolerance to decide which variables are involved in any near dependency. The tolProp value is only used when plot has the value 'on'.

Example: 'tolProp',0.4

Data Types: double

Output Arguments

collapse all

`sValue` — Singular values
vector in descending order

Singular values of scaled X, returned as a vector. The elements of sValue are in descending order.

`condIdx` — Condition indices
vector in ascending order

Condition indices, returned as a vector with elements in ascending order. All condition indices have value between one and the condition number of scaled X. Large indices identify near dependencies among the variables in X. The size of the indices is a measure of how near dependencies are to collinearity.

`VarDecomp` — Variance-decomposition proportions
matrix

Variance-decomposition proportions, returned as a numVars-by-numVars matrix. Large proportions, combined with a large condition index, identify groups of variables involved in near dependencies. The size of the proportions is a measure of how badly the regression is degraded by the dependency.

`h` — Handles to plotted graphics objects
graphics array

Handles to plotted graphics objects, returned as a graphics array. h contains unique plot identifiers, which you can use to query or modify properties of the plot.

collintest plots only if you set 'plot','on'.

More About

collapse all

Belsley Collinearity Diagnostics

Belsley collinearity diagnostics assess the strength and sources of collinearity among variables in a multiple linear regression model.

To assess collinearity, the software computes singular values of the scaled variable matrix, X, and then converts them to condition indices. The conditional indices identify the number and strength of any near dependencies between variables in the variable matrix. The software decomposes the variance of the ordinary least squares (OLS) estimates of the regression coefficients in terms of the singular values to identify variables involved in each near dependency, and the extent to which the dependencies degrade the regression.

Condition Indices

The condition indices for a scaled matrix X identify the number and strength of any near dependencies in X.

For scaled matrix X with p columns and singular values $S_{1} \geq S_{2} \geq \dots \geq S_{p}$ , the condition indices of the columns of X are $S_{1} / S_{j},$ j = 1,...,p.

All condition indices are bounded between one and the condition number.

Condition Number

The condition number of a scaled matrix X is an overall diagnostic for detecting collinearity.

For scaled matrix X with p columns and singular values $S_{1} \geq S_{2} \geq \dots \geq S_{p}$ , the condition number is $S_{1} / S_{p} .$

The condition number achieves its lower bound of one when the columns of scaled X are orthonormal. The condition number rises as variates exhibit greater dependency.

A limitation of the condition number as a diagnostic is that it fails to provide specifics on the strength and sources of any near dependencies.

Multiple Linear Regression Model

A multiple linear regression model is a model of the form $Y = X β + ε .$ X is a design matrix of regression variables, and β is a vector of regression coefficients.

Singular Values

The singular values of a scaled matrix X are the diagonal elements of the matrix S in the singular-value decomposition $U S V^{'} .$

In descending order, the singular values of the scaled matrix X with p columns are $S_{1} \geq S_{2} \geq \dots \geq S_{p}$ .

Variance-Decomposition Proportions

Variance-decomposition proportions identify groups of variates involved in near dependencies, and the extent to which the dependencies degrade the regression.

From the singular value decomposition $U S V^{'}$ of scaled design matrix X (with p columns), let:

V be the matrix of orthonormal eigenvectors of $X^{'} X$
$S_{1} \geq S_{2} \geq \dots \geq S_{p}$ be the ordered diagonal elements of the matrix S

The variance of the OLS estimate of multiple linear regression coefficient i, β_i, is proportional to the sum

$V {(i, 1)}^{2} / S_{1}^{2} + V {(i, 2)}^{2} / S_{2}^{2} + \dots + V {(i, p)}^{2} / S_{p}^{2},$

where $V (i, j)$ denotes element (i,j) of V.

Variance-decomposition proportion (i,j) is the proportion of term j in the sum relative to the entire sum, j = 1,...,p.

The terms $S_{j}^{2}$ are the eigenvalues of scaled $X^{'} X$ . Thus, large variance-decomposition proportions correspond to small eigenvalues of $X^{'} X$ , a common diagnostic for collinearity. The singular-value decomposition provides a more direct, numerically stable view of the eigensystem of scaled $X^{'} X$ .

Tips

For purposes of collinearity diagnostics, Belsley [1] shows that column scaling of the design matrix, X, is always desirable. However, he also shows that centering the data in X is undesirable. For models with an intercept, if you center the data in X, then the role of the constant term in any near dependency is hidden, and yields misleading diagnostics.
Tolerances for identifying large condition indices and variance-decomposition proportions are comparable to critical values in standard hypothesis tests. Experience determines the most useful tolerance, but experiments suggest the collintest defaults are good starting points [1].

References

[1] Belsley, D. A., E. Kuh, and R. E. Welsh. Regression Diagnostics. New York, NY: John Wiley & Sons, Inc., 1980.

[2] Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lϋtkepohl, and T. C. Lee. The Theory and Practice of Econometrics. New York, NY: John Wiley & Sons, Inc., 1985.

Documentation

collintest

Syntax

Description

Examples

Display Belsley Collinearity Diagnostics

Plot Belsley Collinearity Diagnostics

Return Belsley Collinearity Diagnostics

Input Arguments

`X` — Input regression variables
numeric matrix | tabular array

`ax` — Axes on which to plot
`Axes` object

Name-Value Pair Arguments

`'varNames'` — Variable names
string vector | cell vector of strings

`'display'` — Display results indicator
`'on'` (default) | `'off'`

`'plot'` — Plot results indicator
`'off'` (default) | `'on'`

`'tolIdx'` — Condition index tolerance
`30` (default) | scalar value of at least 1

`'tolProp'` — Variance-decomposition proportion tolerance
`0.5` (default) | scalar between 0 and 1

Output Arguments

`sValue` — Singular values
vector in descending order

`condIdx` — Condition indices
vector in ascending order

`VarDecomp` — Variance-decomposition proportions
matrix

`h` — Handles to plotted graphics objects
graphics array

More About

Belsley Collinearity Diagnostics

Condition Indices

Condition Number

Multiple Linear Regression Model

Singular Values

Variance-Decomposition Proportions

Tips

References

See Also

Econometrics Toolbox Documentation

Support

Documentation

collintest

Syntax

Description

Examples

Display Belsley Collinearity Diagnostics

Plot Belsley Collinearity Diagnostics

Return Belsley Collinearity Diagnostics

Input Arguments

X — Input regression variables numeric matrix | tabular array

ax — Axes on which to plot Axes object

Name-Value Pair Arguments

'varNames' — Variable names string vector | cell vector of strings

'display' — Display results indicator 'on' (default) | 'off'

'plot' — Plot results indicator 'off' (default) | 'on'

'tolIdx' — Condition index tolerance 30 (default) | scalar value of at least 1

'tolProp' — Variance-decomposition proportion tolerance 0.5 (default) | scalar between 0 and 1

Output Arguments

sValue — Singular values vector in descending order

condIdx — Condition indices vector in ascending order

VarDecomp — Variance-decomposition proportions matrix

h — Handles to plotted graphics objects graphics array

More About

Belsley Collinearity Diagnostics

Condition Indices

Condition Number

Multiple Linear Regression Model

Singular Values

Variance-Decomposition Proportions

Tips

References

See Also

Econometrics Toolbox Documentation

Support

`X` — Input regression variables
numeric matrix | tabular array

`ax` — Axes on which to plot
`Axes` object

`'varNames'` — Variable names
string vector | cell vector of strings

`'display'` — Display results indicator
`'on'` (default) | `'off'`

`'plot'` — Plot results indicator
`'off'` (default) | `'on'`

`'tolIdx'` — Condition index tolerance
`30` (default) | scalar value of at least 1

`'tolProp'` — Variance-decomposition proportion tolerance
`0.5` (default) | scalar between 0 and 1

`sValue` — Singular values
vector in descending order

`condIdx` — Condition indices
vector in ascending order

`VarDecomp` — Variance-decomposition proportions
matrix

`h` — Handles to plotted graphics objects
graphics array