A regression model with ARIMA errors has the following general form:
(1) |
t = 1,...,T.
yt is the response series.
Xt is row t of X, which is the matrix of concatenated predictor data vectors. That is, Xt is observation t of each predictor series.
c is the regression model intercept.
β is the regression coefficient.
ut is the disturbance series.
εt is the innovations series.
which is the degree p, nonseasonal autoregressive polynomial.
which is the degree ps, seasonal autoregressive polynomial.
which is the degree D, nonseasonal integration polynomial.
which is the degree s, seasonal integration polynomial.
which is the degree q, nonseasonal moving average polynomial.
which is the degree qs, seasonal moving average polynomial.
Suppose that the unconditional disturbance series (ut) is a stationary stochastic processes. Then, you can express the second equation in Equation 1 as
where Ψ(L) is an infinite degree lag operator polynomial [2].
The innovation process (εt) is an independent and identically distributed (iid), mean 0 process with a known distribution. Econometrics Toolbox™ generalizes the innovation process to εt = σzt, where zt is a series of iid random variables with mean 0 and variance 1, and σ2 is the constant variance of εt.
regARIMA
models contain two properties that describe the distribution of εt:
Variance
stores σ2.
Distribution
stores the parametric form of zt.
The default value of Variance
is NaN
, meaning that the innovation variance is unknown. You can assign a positive scalar to Variance
when you specify the model using the name-value pair argument 'Variance',sigma2
(where sigma2
= σ2), or by modifying an existing model using dot notation. Alternatively, you can estimate Variance
using estimate
.
You can specify the following distributions for zt (using name-value pair arguments or dot notation):
Standard Gaussian
Standardized Student’s t with degrees of freedom ν > 2. Specifically,
where Tν is a Student’s t distribution with degrees of freedom ν > 2.
The t distribution is useful for modeling innovations that are more extreme than expected under a Gaussian distribution. Such innovation processes have excess kurtosis, a more peaked (or heavier tailed) distribution than a Gaussian. Note that for ν > 4, the kurtosis (fourth central moment) of Tν is the same as the kurtosis of the Standardized Student’s t (zt), i.e., for a t random variable, the kurtosis is scale invariant.
Tip
It is good practice to assess the distributional properties of the residuals to determine if a Gaussian innovation distribution (the default distribution) is appropriate for your model.
regARIMA
stores the distribution (and degrees of freedom for the t distribution) in the Distribution
property. The data type of Distribution
is a struct
array with potentially two fields: Name
and DoF
.
If the innovations are Gaussian, then the Name
field is Gaussian
, and there is no DoF
field. regARIMA
sets Distribution
to Gaussian
by default.
If the innovations are t-distributed, then the Name
field is t
and the DoF
field is NaN
by default, or you can specify a scalar that is greater than 2.
To illustrate specifying the distribution, consider this regression model with AR(2) errors:
Mdl = regARIMA(2,0,0); Mdl.Distribution
ans = struct with fields:
Name: "Gaussian"
By default, Distribution
property of Mdl
is a struct
array with the field Name
having the value Gaussian
.
If you want to specify a t innovation distribution, then you can either specify the model using the name-value pair argument 'Distribution','t'
, or use dot notation to modify an existing model.
Specify the model using the name-value pair argument.
Mdl = regARIMA('ARLags',1:2,'Distribution','t'); Mdl.Distribution
ans = struct with fields:
Name: "t"
DoF: NaN
If you use the name-value pair argument to specify the t innovation distribution, then the default degrees of freedom is NaN
.
You can use dot notation to yield the same result.
Mdl = regARIMA(2,0,0);
Mdl.Distribution = 't'
Mdl = regARIMA with properties: Description: "ARMA(2,0) Error Model (t Distribution)" Distribution: Name = "t", DoF = NaN Intercept: NaN Beta: [1×0] P: 2 Q: 0 AR: {NaN NaN} at lags [1 2] SAR: {} MA: {} SMA: {} Variance: NaN
If the innovation distribution is , then you can use dot notation to modify the Distribution
property of the existing model Mdl
. You cannot modify the fields of Distribution
using dot notation, e.g., Mdl.Distribution.DoF = 10
is not a value assignment. However, you can display the value of the fields using dot notation.
Mdl.Distribution = struct('Name','t','DoF',10)
Mdl = regARIMA with properties: Description: "ARMA(2,0) Error Model (t Distribution)" Distribution: Name = "t", DoF = 10 Intercept: NaN Beta: [1×0] P: 2 Q: 0 AR: {NaN NaN} at lags [1 2] SAR: {} MA: {} SMA: {} Variance: NaN
tDistributionDoF = Mdl.Distribution.DoF
tDistributionDoF = 10
Since the DoF
field is not a NaN
, it is an equality constraint when you estimate Mdl
using estimate
.
Alternatively, you can specify the innovation distribution using the name-value pair argument.
Mdl = regARIMA('ARLags',1:2,'Intercept',0,... 'Distribution',struct('Name','t','DoF',10))
Mdl = regARIMA with properties: Description: "ARMA(2,0) Error Model (t Distribution)" Distribution: Name = "t", DoF = 10 Intercept: 0 Beta: [1×0] P: 2 Q: 0 AR: {NaN NaN} at lags [1 2] SAR: {} MA: {} SMA: {} Variance: NaN
[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
[2] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell, 1938.