mvregress

Multivariate linear regression

Syntax

beta = mvregress(X,Y)

beta = mvregress(X,Y,Name,Value)

[beta,Sigma]
= mvregress(___)

[beta,Sigma,E,CovB,logL]
= mvregress(___)

Description

example

beta = mvregress(X,Y) returns the estimated coefficients for a multivariate normal regression of the d-dimensional responses in Y on the design matrices in X.

example

beta = mvregress(X,Y,Name,Value) returns the estimated coefficients using additional options specified by one or more name-value pair arguments. For example, you can specify the estimation algorithm, initial estimate values, or maximum number of iterations for the regression.

example

[beta,Sigma] = mvregress(___) also returns the estimated d-by-d variance-covariance matrix of Y, using any of the input arguments from the previous syntaxes.

example

[beta,Sigma,E,CovB,logL] = mvregress(___) also returns a matrix of residuals E, estimated variance-covariance matrix of the regression coefficients CovB, and the value of the log likelihood objective function after the last iteration logL.

Examples

collapse all

Multivariate Regression Model for Panel Data with Different Intercepts

Open Live Script

Fit a multivariate regression model to panel data, assuming different intercepts and common slopes.

Load the sample data.

load('flu')

The dataset array flu contains national CDC flu estimates, and nine separate regional estimates based on Google® query data.

Extract the response and predictor data.

Y = double(flu(:,2:end-1));
[n,d] = size(Y);
x = flu.WtdILI;

The responses in Y are the nine regional flu estimates. Observations exist for every week over a one-year period, so $n$ = 52. The dimension of the responses corresponds to the regions, so $d$ = 9. The predictors in x are the weekly national flu estimates.

Plot the flu data, grouped by region.

figure;
regions = flu.Properties.VarNames(2:end-1);
plot(x,Y,'x')
legend(regions,'Location','NorthWest')

Fit the multivariate regression model $y_{i j} = α_{j} + β x_{i j} + ϵ_{i j}$ , where $i = 1, \dots, n$ and $j = 1, \dots, d$ , with between-region concurrent correlation $C O V (ϵ_{i j}, ϵ_{i j}) = σ_{j j}$ .

There are $K$ = 10 regression coefficients to estimate: nine intercept terms and a common slope. The input argument X should be an $n$ -element cell array of $d$ -by- $K$ design matrices.

X = cell(n,1);
for i = 1:n
	X{i} = [eye(d) repmat(x(i),d,1)];
end
[beta,Sigma] = mvregress(X,Y);

beta contains estimates of the $K$ -dimensional coefficient vector $(α_{1}, α_{2}, \dots, α_{9}, β)^{'}$ .

Sigma contains estimates of the $d$ -by- $d$ variance-covariance matrix ${(σ_{i j})}_{d \times d}$ , $i, j = 1, \dots, d$ for the between-region concurrent correlations.

Plot the fitted regression model.

B = [beta(1:d)';repmat(beta(end),1,d)];
xx = linspace(.5,3.5)';
fits = [ones(size(xx)),xx]*B;

figure;
h = plot(x,Y,'x',xx,fits,'-');
for i = 1:d
	set(h(d+i),'color',get(h(i),'color'));
end
legend(regions,'Location','NorthWest');

The plot shows that each regression line has a different intercept but the same slope. Upon visual inspection, some regression lines appear to fit the data better than others.

Multivariate Regression for Panel Data with Different Slopes

Open Live Script

Fit a multivariate regression model to panel data using least squares, assuming different intercepts and slopes.

Load the sample data.

load('flu');

The dataset array flu contains national CDC flu estimates, and nine separate regional estimates based on Google® queries.

Extract the response and predictor data.

Y = double(flu(:,2:end-1));
[n,d] = size(Y);
x = flu.WtdILI;

Fit the multivariate regression model $y_{i j} = α_{j} + β_{j} x_{i j} + ϵ_{i j}$ , where $i = 1, \dots, n$ and $j = 1, \dots, d$ , with between-region concurrent correlation $C O V (ϵ_{i j}, ϵ_{i j}) = σ_{j j}$ .

There are $K$ = 18 regression coefficients to estimate: nine intercept terms, and nine slope terms. X is an $n$ -element cell array of $d$ -by- $K$ design matrices.

X = cell(n,1);
for i = 1:n
    X{i} = [eye(d) x(i)*eye(d)];
end
[beta,Sigma] = mvregress(X,Y,'algorithm','cwls');

beta contains estimates of the $K$ -dimensional coefficient vector $(α_{1}, α_{2}, \dots, α_{9}, β_{1}, β_{2}, \dots, β_{9})^{'}$ .

Plot the fitted regression model.

B = [beta(1:d)';beta(d+1:end)'];
xx = linspace(.5,3.5)';
fits = [ones(size(xx)),xx]*B;

figure;
h = plot(x,Y,'x',xx,fits,'-');
for i = 1:d
	set(h(d+i),'color',get(h(i),'color'));
end

regions = flu.Properties.VarNames(2:end-1);
legend(regions,'Location','NorthWest');

The plot shows that each regression line has a different intercept and slope.

Multivariate Regression With a Single Design Matrix

Open Live Script

Fit a multivariate regression model using a single $n$ -by- $P$ design matrix for all response dimensions.

Load the sample data.

load('flu')

The dataset array flu contains national CDC flu estimates, and nine separate regional estimates based on Google® queries.

Extract the response and predictor data.

Y = double(flu(:,2:end-1));
[n,d] = size(Y);
x = flu.WtdILI;

Create an $n$ -by- $P$ design matrix X. Add a column of ones to include a constant term in the regression.

X = [ones(size(x)),x];

Fit the multivariate regression model

$y_{i j} = α_{j} + β_{j} x_{i j} + ϵ_{i j},$

where $i = 1, \dots, n$ and $j = 1, \dots, d$ , with between-region concurrent correlation

$C O V (ϵ_{i j}, ϵ_{i j}) = σ_{j j} .$

There are 18 regression coefficients to estimate: nine intercept terms, and nine slope terms.

[beta,Sigma,E,CovB,logL] = mvregress(X,Y);

beta contains estimates of the $P$ -by- $d$ coefficient matrix. Sigma contains estimates of the $d$ -by- $d$ variance-covariance matrix for the between-region concurrent correlations. E is a matrix of the residuals. CovB is the estimated variance-covariance matrix of the regression coefficients. logL is the value of the log likelihood objective function after the last iteration.

Plot the fitted regression model.

B = beta;
xx = linspace(.5,3.5)';
fits = [ones(size(xx)),xx]*B;

figure
h = plot(x,Y,'x', xx,fits,'-');
for i = 1:d
    set(h(d+i),'color',get(h(i),'color'))
end

regions = flu.Properties.VarNames(2:end-1);
legend(regions,'Location','NorthWest')

The plot shows that each regression line has a different intercept and slope.

Input Arguments

collapse all

`X` — Design matrices
matrix | cell array of matrices

Design matrices for the multivariate regression, specified as a matrix or cell array of matrices. n is the number of observations in the data, K is the number of regression coefficients to estimate, p is the number of predictor variables, and d is the number of dimensions in the response variable matrix Y.

If d = 1, then specify X as a single n-by-K design matrix.
If d > 1 and all d dimensions have the same design matrix, then you can specify X as a single n-by-p design matrix (not in a cell array).
If d > 1 and all n observations have the same design matrix, then you can specify X as a cell array containing a single d-by-K design matrix.
If d > 1 and all n observations do not have the same design matrix, then specify X as a cell array of length n containing d-by-K design matrices.

To include a constant term in the regression model, each design matrix should contain a column of ones.

mvregress treats NaN values in X as missing values, and ignores rows in X with missing values.

Data Types: single | double | cell

`Y` — Response variables
matrix

Response variables, specified as an n-by-d matrix. n is the number of observations in the data, and d is the number of dimensions in the response. When d = 1, mvregress treats the values in Y like n independent response values.

mvregress treats NaN values in Y as missing values, and handles them according to the estimation algorithm specified using the name-value pair argument algorithm.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'algorithm','cwls','covar0',C specifies covariance-weighted least squares estimation using the covariance matrix C.

`'algorithm'` — Estimation algorithm
`'mvn'` | `'ecm'` | `'cwls'`

Estimation algorithm, specified as the comma-separated pair consisting of 'algorithm' and one of the following.

`'mvn'`	Ordinary multivariate normal maximum likelihood estimation.
`'ecm'`	Maximum likelihood estimation via the ECM algorithm.
`'cwls'`	Covariance-weighted least squares estimation.

The default algorithm depends on the presence of missing data.

For complete data, the default is 'mvn'.
If there are any missing responses (indicated by NaN), the default is 'ecm', provided the sample size is sufficient to estimate all parameters. Otherwise, the default algorithm is 'cwls'.

Note

If algorithm has the value 'mvn', then mvregress removes observations with missing response values before estimation.

Example: 'algorithm','ecm'

`'beta0'` — Initial estimates for regression coefficients
vector

Initial estimates for the regression coefficients, specified as the comma-separated pair consisting of 'beta0' and a vector with K elements. The default value is a vector of 0s.

The beta0 argument is not used if the estimation algorithm is 'mvn'.

`'covar0'` — Initial estimate for variance-covariance matrix
matrix

Initial estimate for the variance-covariance matrix, Sigma, specified as the comma-separated pair consisting of 'covar0' and a symmetric, positive definite, d-by-d matrix. The default value is the identity matrix.

If the estimation algorithm is 'cwls', then mvregress uses covar0 as the weighting matrix at each iteration, without changing it.

`'covtype'` — Type of variance-covariance matrix
`'full'` (default) | `'diagonal'`

Type of variance-covariance matrix to estimate for Y, specified as the comma-separated pair consisting of 'covtype' and one of the following.

`'full'`	Estimate all d(d + 1)/2 variance-covariance elements.
`'diagonal'`	Estimate only the d diagonal elements of the variance-covariance matrix.

Example: 'covtype','diagonal'

`'maxiter'` — Maximum number of iterations
`100` (default) | positive integer

Maximum number of iterations for the estimation algorithm, specified as the comma-separated pair consisting of 'maxiter' and a positive integer.

Iterations continue until estimates are within the convergence tolerances tolbeta and tolobj, or the maximum number of iterations specified by maxiter is reached. If both tolbeta and tolobj are 0, then mvregress performs maxiter iterations with no convergence tests.

Example: 'maxiter',50

`'outputfcn'` — Function to evaluate each iteration
function handle

Function to evaluate at each iteration, specified as the comma-separated pair consisting of 'outputfcn' and a function handle. The function must return a logical true or false. At each iteration, mvregress evaluates the function. If the result is true, iterations stop. Otherwise, iterations continue. For example, you could specify a function that plots or displays current iteration results, and returns true if you close the figure.

The function must accept three input arguments, in this order:

Vector of current coefficient estimates
Structure containing these three fields:

Covar Current value of the variance-covariance matrix
iteration Current iteration number
fval Current value of the loglikelihood objective function
Text that takes these three values:

'init' When the function is called during initialization
'iter' When the function is called after an iteration
'done' When the function is called after completion

`'tolbeta'` — Convergence tolerance for regression coefficients
`sqrt(eps)` (default) | positive scalar value

Convergence tolerance for regression coefficients, specified as the comma-separated pair consisting of 'tolbeta' and a positive scalar value.

Let $b^{t}$ denote the estimate of the coefficient vector at iteration t, and $τ_{β}$ be the tolerance specified by tolbeta. The convergence criterion for regression coefficient estimation is

$‖ b^{t} - b^{t - 1} ‖ < τ_{β} \sqrt{K} (1 + ‖ b^{t} ‖),$

where K is the length of $b^{t}$ and $‖ v ‖$ is the norm of a vector $v .$

Example: 'tolbeta',1e-5

`'tolobj'` — Convergence tolerance for loglikelihood objective function
`eps^(3/4)` (default) | positive scalar value

Convergence tolerance for the loglikelihood objective function, specified as the comma-separated pair consisting of 'tolobj' and a positive scalar value.

Let $L^{t}$ denote the value of the loglikelihood objective function at iteration t, and $τ_{ℓ}$ be the tolerance specified by tolobj. The convergence criterion for the objective function is

$| L^{t} - L^{t - 1} | < τ_{ℓ} (1 + | L^{t} |) .$

Example: 'tolobj',1e-5

`'varformat'` — Format for parameter estimate variance-covariance matrix
`'beta'` (default) | `'full'`

Format for the parameter estimate variance-covariance matrix, CovB, specified as the comma-separated pair consisting of 'varformat' and one of the following.

`'beta'`	Return the variance-covariance matrix for only the regression coefficient estimates, `beta`.
`'full'`	Return the variance-covariance matrix for both the regression coefficient estimates, `beta`, and the variance-covariance matrix estimate, `Sigma`.

Example: 'varformat','full'

`'vartype'` — Type of variance-covariance matrix for parameter estimates
`'hessian'` (default) | `'fisher'`

Type of variance-covariance matrix for parameter estimates, specified as the comma-separated pair consisting of 'vartype' and either 'hessian' or 'fisher'.

If the value is 'hessian', then mvregress uses the Hessian, or observed information, matrix to compute CovB.
If the value is 'fisher', then mvregress uses the complete-data Fisher, or expected information, matrix to compute CovB.

The 'hessian' method takes into account the increase uncertainties due to missing data, while the 'fisher' method does not.

Example: 'vartype','fisher'

Output Arguments

collapse all

`beta` — Estimated regression coefficients
column vector | matrix

Estimated regression coefficients, returned as a column vector or matrix.

If you specify X as a single n-by-K design matrix, then mvregress returns beta as a column vector of length K. For example, if X is a 20-by-5 design matrix, then beta is a 5-by-1 column vector.
If you specify X as a cell array containing one or more d-by-K design matrices, then mvregress returns beta as a column vector of length K. For example, if X is a cell array containing 2-by-10 design matrices, then beta is a 10-by-1 column vector.
If you specify X as a single n-by-p design matrix (not in a cell array), and Y has dimension d > 1, then mvregress returns beta as a p-by-d matrix. For example, if X is a 20-by-5 design matrix, and Y has two dimensions such that d = 2, then beta is a 5-by-2 matrix, and the fitted Y values are X × beta.

`Sigma` — Estimated variance-covariance matrix
square matrix

Estimated variance-covariance matrix for the responses in Y, returned as a d-by-d square matrix.

Note

The estimated variance-covariance matrix, Sigma, is not the sample covariance matrix of the residual matrix, E.

`E` — Residuals
matrix

Residuals for the fitted regression model, returned as an n-by-d matrix.

If algorithm has the value 'ecm' or 'cwls', then mvregress computes the residual values corresponding to missing values in Y as the difference between the conditionally imputed values and the fitted values.

Note

If algorithm has the value 'mvn', then mvregress removes observations with missing response values before estimation.

`CovB` — Parameter estimate variance-covariance matrix
square matrix

Parameter estimate variance-covariance matrix, returned as a square matrix.

If varformat has the value 'beta' (default), then CovB is the estimated variance-covariance matrix of the coefficient estimates in beta.
If varformat has the value 'full', then CovB is the estimated variance-covariance matrix of the combined estimates in beta and Sigma.

`logL` — Loglikelihood objective function value
scalar value

Loglikelihood objective function value after the last iteration, returned as a scalar value.

More About

collapse all

Multivariate Normal Regression

Multivariate normal regression is the regression of a d-dimensional response on a design matrix of predictor variables, with normally distributed errors. The errors can be heteroscedastic and correlated.

The model is

$y_{i} = X_{i} β + e_{i}, i = 1, \dots, n,$

where

$y_{i}$ is a d-dimensional vector of responses.
$X_{i}$ is a design matrix of predictor variables.
$β$ is vector or matrix of regression coefficients.
$e_{i}$ is a d-dimensional vector of error terms, with multivariate normal distribution

$e_{i} ~ M V N_{d} (0, Σ) .$

Conditionally Imputed Values

The expectation/conditional maximization ('ecm') and covariance-weighted least squares ('cwls') estimation algorithms include imputation of missing response values.

Let $\tilde{y}$ denote missing observations. The conditionally imputed values are the expected value of the missing observation given the observed data, $Ε (\tilde{y} | y) .$

The joint distribution of the missing and observed responses is a multivariate normal distribution,

$(\begin{array}{l} \tilde{y} \\ y \end{array}) ~ M V N {(\begin{array}{l} \tilde{X} β \\ X β \end{array}), (\begin{matrix} Σ_{\tilde{y}} & Σ_{\tilde{y} y} \\ Σ_{y \tilde{y}} & Σ_{y} \end{matrix})} .$

Using properties of the multivariate normal distribution, the imputed conditional expectation is given by

$Ε (\tilde{y} | y) = \tilde{X} β + Σ_{\tilde{y} y} Σ_{y}^{- 1} (y - X β) .$

Note

mvregress only imputes missing response values. Observations with missing values in the design matrix are removed.

References

[1] Little, Roderick J. A., and Donald B. Rubin. Statistical Analysis with Missing Data. 2nd ed., Hoboken, NJ: John Wiley & Sons, Inc., 2002.

[2] Meng, Xiao-Li, and Donald B. Rubin. “Maximum Likelihood Estimation via the ECM Algorithm.” Biometrika. Vol. 80, No. 2, 1993, pp. 267–278.

[3] Sexton, Joe, and A. R. Swensen. “ECM Algorithms that Converge at the Rate of EM.” Biometrika. Vol. 87, No. 3, 2000, pp. 651–662.

[4] Dempster, A. P., N. M. Laird, and D. B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society. Series B, Vol. 39, No. 1, 1977, pp. 1–37.

Documentation

mvregress

Syntax

Description

Examples

Multivariate Regression Model for Panel Data with Different Intercepts

Multivariate Regression for Panel Data with Different Slopes

Multivariate Regression With a Single Design Matrix

Input Arguments

`X` — Design matrices
matrix | cell array of matrices

`Y` — Response variables
matrix

Name-Value Pair Arguments

`'algorithm'` — Estimation algorithm
`'mvn'` | `'ecm'` | `'cwls'`

`'beta0'` — Initial estimates for regression coefficients
vector

`'covar0'` — Initial estimate for variance-covariance matrix
matrix

`'covtype'` — Type of variance-covariance matrix
`'full'` (default) | `'diagonal'`

`'maxiter'` — Maximum number of iterations
`100` (default) | positive integer

`'outputfcn'` — Function to evaluate each iteration
function handle

`'tolbeta'` — Convergence tolerance for regression coefficients
`sqrt(eps)` (default) | positive scalar value

`'tolobj'` — Convergence tolerance for loglikelihood objective function
`eps^(3/4)` (default) | positive scalar value

`'varformat'` — Format for parameter estimate variance-covariance matrix
`'beta'` (default) | `'full'`

`'vartype'` — Type of variance-covariance matrix for parameter estimates
`'hessian'` (default) | `'fisher'`

Output Arguments

`beta` — Estimated regression coefficients
column vector | matrix

`Sigma` — Estimated variance-covariance matrix
square matrix

`E` — Residuals
matrix

`CovB` — Parameter estimate variance-covariance matrix
square matrix

`logL` — Loglikelihood objective function value
scalar value

More About

Multivariate Normal Regression

Conditionally Imputed Values

References

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

`Covar`	Current value of the variance-covariance matrix
`iteration`	Current iteration number
`fval`	Current value of the loglikelihood objective function

`'init'`	When the function is called during initialization
`'iter'`	When the function is called after an iteration
`'done'`	When the function is called after completion

Documentation

mvregress

Syntax

Description

Examples

Multivariate Regression Model for Panel Data with Different Intercepts

Multivariate Regression for Panel Data with Different Slopes

Multivariate Regression With a Single Design Matrix

Input Arguments

X — Design matrices matrix | cell array of matrices

Y — Response variables matrix

Name-Value Pair Arguments

'algorithm' — Estimation algorithm 'mvn' | 'ecm' | 'cwls'

'beta0' — Initial estimates for regression coefficients vector

'covar0' — Initial estimate for variance-covariance matrix matrix

'covtype' — Type of variance-covariance matrix 'full' (default) | 'diagonal'

'maxiter' — Maximum number of iterations 100 (default) | positive integer

'outputfcn' — Function to evaluate each iteration function handle

'tolbeta' — Convergence tolerance for regression coefficients sqrt(eps) (default) | positive scalar value

'tolobj' — Convergence tolerance for loglikelihood objective function eps^(3/4) (default) | positive scalar value

'varformat' — Format for parameter estimate variance-covariance matrix 'beta' (default) | 'full'

'vartype' — Type of variance-covariance matrix for parameter estimates 'hessian' (default) | 'fisher'

Output Arguments

beta — Estimated regression coefficients column vector | matrix

Sigma — Estimated variance-covariance matrix square matrix

E — Residuals matrix

CovB — Parameter estimate variance-covariance matrix square matrix

logL — Loglikelihood objective function value scalar value

More About

Multivariate Normal Regression

Conditionally Imputed Values

References

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

`X` — Design matrices
matrix | cell array of matrices

`Y` — Response variables
matrix

`'algorithm'` — Estimation algorithm
`'mvn'` | `'ecm'` | `'cwls'`

`'beta0'` — Initial estimates for regression coefficients
vector

`'covar0'` — Initial estimate for variance-covariance matrix
matrix

`'covtype'` — Type of variance-covariance matrix
`'full'` (default) | `'diagonal'`

`'maxiter'` — Maximum number of iterations
`100` (default) | positive integer

`'outputfcn'` — Function to evaluate each iteration
function handle

`'tolbeta'` — Convergence tolerance for regression coefficients
`sqrt(eps)` (default) | positive scalar value

`'tolobj'` — Convergence tolerance for loglikelihood objective function
`eps^(3/4)` (default) | positive scalar value

`'varformat'` — Format for parameter estimate variance-covariance matrix
`'beta'` (default) | `'full'`

`'vartype'` — Type of variance-covariance matrix for parameter estimates
`'hessian'` (default) | `'fisher'`

`beta` — Estimated regression coefficients
column vector | matrix

`Sigma` — Estimated variance-covariance matrix
square matrix

`E` — Residuals
matrix

`CovB` — Parameter estimate variance-covariance matrix
square matrix

`logL` — Loglikelihood objective function value
scalar value