Class: regARIMA
Forecast responses of regression model with ARIMA errors
[Y,YMSE]
= forecast(Mdl,numperiods)
[Y,YMSE,U]
= forecast(Mdl,numperiods)
[Y,YMSE,U]
= forecast(Mdl,numperiods,Name,Value)
[
forecasts responses (Y
,YMSE
]
= forecast(Mdl
,numperiods
)Y
) for a regression model with ARIMA time series
errors and generates corresponding mean square errors (YMSE
).
[
additionally forecasts unconditional disturbances for a regression model with ARIMA
errors.Y
,YMSE
,U
]
= forecast(Mdl
,numperiods
)
[
forecasts with additional options specified by one or more
Y
,YMSE
,U
]
= forecast(Mdl
,numperiods
,Name,Value
)Name,Value
pair arguments.
numperiods
— Forecast horizonForecast horizon, or the number of time points in the forecast period, specified as a positive integer.
Data Types: double
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'E0'
— Presample innovationsPresample innovations that initialize the moving average (MA)
component of the ARIMA error model, specified as the comma-separated
pair consisting of 'E0'
and a numeric column vector
or numeric matrix. forecast
assumes that the
presample innovations have a mean of 0.
If E0
is a column vector, then
forecast
applies it to each
forecasted path.
If E0
, Y0
, and
U0
are matrices with multiple paths,
then they must have the same number of columns.
E0
requires at least
Mdl.Q
rows. If E0
contains extra rows, then forecast
uses
the latest presample innovations. The last row contains the
latest presample innovation.
By default, if U0
contains at least
Mdl.P
+ Mdl.Q
rows, then
forecast
infers E0
from
U0
. If U0
has an insufficient
number of rows, and forecast
cannot infer sufficient
observations of U0
from the presample data
(Y0
and X0
), then
E0
is 0.
Data Types: double
'U0'
— Presample unconditional disturbancesPresample unconditional disturbances that initialize the
autoregressive (AR) component of the ARIMA error model, specified as the
comma-separated pair consisting of 'U0'
and a numeric
column vector or numeric matrix. If you do not specify presample
innovations E0
, forecast
uses U0
to infer them.
If U0
is a column vector, then
forecast
applies it to each
forecasted path.
If U0
, Y0
, and
E0
are matrices with multiple paths,
then they must have the same number of columns.
U0
requires at least
Mdl.P
rows. If U0
contains extra rows, then forecast
uses
the latest presample unconditional disturbances. The last
row contains the latest presample unconditional
disturbance.
By default, if the presample data (Y0
and
X0
) contains at least Mdl.P
rows, then forecast
infers U0
from
the presample data. If you do not specify presample data, then all
required presample unconditional disturbances are 0.
Data Types: double
'X0'
— Presample predictor dataPresample predictor data that initializes the model for forecasting,
specified as the comma-separated pair consisting of
'X0'
and a numeric matrix. The columns of
X0
are separate time series variables.
forecast
uses X0
to infer
presample unconditional disturbances U0
. Therefore,
if you specify U0
, forecast
ignores X0
.
If you do not specify U0
, then
X0
requires at least
Mdl.P
rows to infer
U0
. If X0
contains
extra rows, then forecast
uses the latest
observations. The last row contains the latest observation
of each series.
X0
requires the same number of columns
as the length of Mdl.Beta
.
If you specify X0
, then you must also
specify XF
.
forecast
treats X0
as a fixed (nonstochastic) matrix.
Data Types: double
'XF'
— Forecasted or future predictor dataForecasted or future predictor data, specified as the comma-separated
pair consisting of 'XF'
and a numeric matrix.
The columns of XF
are separate time series, each
corresponding to forecasts of the series in X0
. Row
t of XF
contains the
t-period-ahead forecasts of
X0
.
If you specify X0
, then you must also specify
XF
. XF
and
X0
require the same number of columns.
XF
must have at least
numperiods
rows. If XF
exceeds
numperiods
rows, then forecast
uses the first numperiods
forecasts.
forecast
treats XF
as a fixed
(nonstochastic) matrix.
By default, forecast
does not include a regression
component in the model, regardless of the presence of regression
coefficients in Mdl
.
Data Types: double
'Y0'
— Presample response dataPresample response data that initializes the model for forecasting,
specified as the comma-separated pair consisting of
'Y0'
and a numeric column vector or numeric
matrix. forecast
uses Y0
to
infer presample unconditional disturbances U0
.
Therefore, if you specify U0
,
forecast
ignores Y0
.
If Y0
is a column vector,
forecast
applies it to each
forecasted path.
If Y0
, E0
, and
U0
are matrices with multiple paths,
then they must have the same number of columns.
If you do not specify U0
, then
Y0
requires at least
Mdl.P
rows to infer
U0
. If Y0
contains
extra rows, then forecast
uses the latest
observations. The last row contains the latest
observation.
Data Types: double
Notes
NaN
s in E0
,
U0
, X0
, XF
,
and Y0
indicate missing values and
forecast
removes them. The software merges the
presample data sets (E0
, U0
,
X0
, and Y0
), then uses
list-wise deletion to remove any NaN
s.
forecast
similarly removes
NaN
s from XF
. Removing
NaN
s in the data reduces the sample size. Such
removal can also create irregular time series.
forecast
assumes that you synchronize presample
data such that the latest observation of each presample series occurs
simultaneously.
Set X0
to the same predictor matrix as
X
used in the estimation, simulation, or
inference of Mdl
. This assignment ensures correct
inference of the unconditional disturbances,
U0
.
To include a regression component in the response forecast, you must
specify the forecasted predictor data XF
. That is,
you can specify XF
without also specifying
X0
, but forecast
issues
an error when you specify X0
without also specifying
XF
.
Y
— Minimum mean square error forecasts of response dataMinimum mean square error (MMSE) forecasts of the response data, returned
as a numeric matrix. Y
has numperiods
rows and numPaths
columns.
If you do not specify Y0
,
E0
, and U0
, then
Y
is a numperiods
column vector.
If you specify Y0
, E0
,
and U0
, all having
numPaths
columns, then
Y
is a
numperiods
-by-numPaths
matrix.
Row i of Y
contains the
forecasts for the ith period.
Data Types: double
YMSE
— Mean square errors of forecasted responsesMean square errors (MSEs) of the forecasted responses, returned as a
numeric matrix. YMSE
has numperiods
rows and numPaths
columns.
If you do not specify Y0
,
E0
, and U0
, then
YMSE
is a numperiods
column vector.
If you specify Y0
, E0
,
and U0
, all having
numPaths
columns, then
YMSE
is a
numperiods
-by-numPaths
matrix.
Row i of YMSE
contains
the forecast error variances for the ith
period.
The predictor data does not contribute variability to
YMSE
because forecast
treats XF
as a nonstochastic matrix.
The square roots of YMSE
are the standard
errors of the forecasts of Y
.
Data Types: double
U
— Minimum mean square error forecasts of future ARIMA error model unconditional disturbancesMinimum mean square error (MMSE) forecasts of future ARIMA error model
unconditional disturbances, returned as a numeric matrix.
U
has numperiods
rows and
numPaths
columns.
If you do not specify Y0
,
E0
, and U0
, then
U
is a numperiods
column vector.
If you specify Y0
, E0
,
and U0
, all having
numPaths
columns, then
U
is a
numperiods
-by-numPaths
matrix.
Row i of U
contains the
forecasted unconditional disturbances for the
ith period.
Data Types: double
Forecast responses from the following regression model with ARMA(2,1) errors over a 30-period horizon:
where is Gaussian with variance 0.1.
Specify the model. Simulate responses from the model and two predictor series.
Mdl0 = regARIMA('Intercept',0,'AR',{0.5 -0.8},... 'MA',-0.5,'Beta',[0.1 -0.2],'Variance',0.1); rng(1); % For reproducibility X = randn(130,2); y = simulate(Mdl0,130,'X',X);
Fit the model to the first 100 observations, and reserve the remaining 30 observations to evaluate forecast performance.
Mdl = regARIMA('ARLags',1:2); EstMdl = estimate(Mdl,y(1:100),'X',X(1:100,:));
Regression with ARMA(2,0) Error Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ __________ Intercept 0.004358 0.021314 0.20446 0.83799 AR{1} 0.36833 0.067103 5.4891 4.0408e-08 AR{2} -0.75063 0.090865 -8.2609 1.4453e-16 Beta(1) 0.076398 0.023008 3.3205 0.00089863 Beta(2) -0.1396 0.023298 -5.9919 2.0741e-09 Variance 0.079876 0.01342 5.9522 2.6453e-09
EstMdl
is a new regARIMA
model containing the estimates. The estimates are close to their true values.
Use EstMdl
to forecast a 30-period horizon. Visually compare the forecasts to the holdout data using a plot.
[yF,yMSE] = forecast(EstMdl,30,'Y0',y(1:100),... 'X0',X(1:100,:),'XF',X(101:end,:)); figure plot(y,'Color',[.7,.7,.7]); hold on plot(101:130,yF,'b','LineWidth',2); plot(101:130,yF+1.96*sqrt(yMSE),'r:',... 'LineWidth',2); plot(101:130,yF-1.96*sqrt(yMSE),'r:','LineWidth',2); h = gca; ph = patch([repmat(101,1,2) repmat(130,1,2)],... [h.YLim fliplr(h.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend('Observed','Forecast',... '95% Forecast Interval','Location','Best'); title(['30-Period Forecasts and Approximate 95% '... 'Forecast Intervals']) axis tight hold off
Many observations in the holdout sample fall beyond the 95% forecast intervals. Two reasons for this are:
The predictors are randomly generated in this example. estimate
treats the predictors as fixed. The 95% forecast intervals based on the estimates from estimate
do not account for the variability in the predictors.
By shear chance, the estimation period seems less volatile than the forecast period. estimate
uses the less volatile estimation period data to estimate the parameters. Therefore, forecast intervals based on the estimates should not cover observations that have an underlying innovations process with larger variability.
Forecast stationary, log GDP using a regression model with ARMA(1,1) errors, including CPI as a predictor.
Load the U.S. macroeconomic data set and preprocess the data.
load Data_USEconModel; logGDP = log(DataTable.GDP); dlogGDP = diff(logGDP); % For stationarity dCPI = diff(DataTable.CPIAUCSL); % For stationarity numObs = length(dlogGDP); gdp = dlogGDP(1:end-15); % Estimation sample cpi = dCPI(1:end-15); T = length(gdp); % Effective sample size frstHzn = T+1:numObs; % Forecast horizon hoCPI = dCPI(frstHzn); % Holdout sample dts = dates(2:end); % Date nummbers
Fit a regression model with ARMA(1,1) errors.
Mdl = regARIMA('ARLags',1,'MALags',1); EstMdl = estimate(Mdl,gdp,'X',cpi);
Regression with ARMA(1,1) Error Model (Gaussian Distribution): Value StandardError TStatistic PValue __________ _____________ __________ __________ Intercept 0.014793 0.0016289 9.0818 1.0684e-19 AR{1} 0.57601 0.10009 5.7548 8.6754e-09 MA{1} -0.15258 0.11978 -1.2738 0.20272 Beta(1) 0.0028972 0.0013989 2.071 0.038355 Variance 9.5734e-05 6.5562e-06 14.602 2.723e-48
Forecast the GDP rate over a 15-quarter horizon. Use the estimation sample as a presample for the forecast.
[gdpF,gdpMSE] = forecast(EstMdl,15,'Y0',gdp,... 'X0',cpi,'XF',hoCPI);
Plot the forecasts and 95% forecast intervals.
figure h1 = plot(dts(end-65:end),dlogGDP(end-65:end),... 'Color',[.7,.7,.7]); datetick hold on h2 = plot(dts(frstHzn),gdpF,'b','LineWidth',2); h3 = plot(dts(frstHzn),gdpF+1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); plot(dts(frstHzn),gdpF-1.96*sqrt(gdpMSE),'r:','LineWidth',2); ha = gca; title(['{\bf Forecasts and Approximate 95% }'... '{\bf Forecast Intervals for GDP rate}']); ph = patch([repmat(dts(frstHzn(1)),1,2) repmat(dts(frstHzn(end)),1,2)],... [ha.YLim fliplr(ha.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend([h1 h2 h3],{'Observed GDP rate','Forecasted GDP rate ',... '95% Forecast Interval'},'Location','Best','AutoUpdate','off'); axis tight hold off
Forecast unit root nonstationary, log GDP using a regression model with ARIMA(1,1,1) errors, including CPI as a predictor and a known intercept.
Load the U.S. Macroeconomic data set and preprocess the data.
load Data_USEconModel; numObs = length(DataTable.GDP); logGDP = log(DataTable.GDP(1:end-15)); cpi = DataTable.CPIAUCSL(1:end-15); T = length(logGDP); % Effective sample size frstHzn = T+1:numObs; % Forecast horizon hoCPI = DataTable.CPIAUCSL(frstHzn); % Holdout sample
Specify the model for the estimation period.
Mdl = regARIMA('ARLags',1,'MALags',1,'D',1);
The intercept is not identifiable in a model with integrated errors, so fix its value before estimation. One way to do this is to estimate the intercept using simple linear regression.
Reg4Int = [ones(T,1), cpi]\logGDP; intercept = Reg4Int(1);
Consider performing a sensitivity analysis by using a grid of intercepts.
Set the intercept and fit the regression model with ARIMA(1,1,1) errors.
Mdl.Intercept = intercept; EstMdl = estimate(Mdl,logGDP,'X',cpi,... 'Display','off')
EstMdl = regARIMA with properties: Description: "ARIMA(1,1,1) Error Model (Gaussian Distribution)" Distribution: Name = "Gaussian" Intercept: 5.80142 Beta: [0.00396704] P: 2 D: 1 Q: 1 AR: {0.922717} at lag [1] SAR: {} MA: {-0.387864} at lag [1] SMA: {} Variance: 0.000108944 Regression with ARIMA(1,1,1) Error Model (Gaussian Distribution)
Forecast GDP over a 15-quarter horizon. Use the estimation sample as a presample for the forecast.
[gdpF,gdpMSE] = forecast(EstMdl,15,'Y0',logGDP,... 'X0',cpi,'XF',hoCPI);
Plot the forecasts and 95% forecast intervals.
figure h1 = plot(dates(end-65:end),log(DataTable.GDP(end-65:end)),... 'Color',[.7,.7,.7]); datetick hold on h2 = plot(dates(frstHzn),gdpF,'b','LineWidth',2); h3 = plot(dates(frstHzn),gdpF+1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); plot(dates(frstHzn),gdpF-1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); ha = gca; title(['{\bf Forecasts and Approximate 95% }'... '{\bf Forecast Intervals for log GDP}']); ph = patch([repmat(dates(frstHzn(1)),1,2) repmat(dates(frstHzn(end)),1,2)],... [ha.YLim fliplr(ha.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend([h1 h2 h3],{'Observed GDP','Forecasted GDP',... '95% Forecast Interval'},'Location','Best','AutoUpdate','off'); axis tight hold off
The unconditional disturbances, , are nonstationary, therefore the widths of the forecast intervals grow with time.
Time base partitions for forecasting are
two disjoint, contiguous intervals of the time base; each interval contains time
series data for forecasting a dynamic model. The forecast
period (forecast horizon) is a numperiods
length partition at the end of the time base during which
forecast
generates forecasts Y
from
the dynamic model Mdl
. The presample
period is the entire partition occurring before the forecast period.
forecast
can require observed responses
Y0
, regression data X0
, unconditional
disturbances U0
, or innovations E0
in the
presample period to initialize the dynamic model for forecasting. The model
structure determines the types and amounts of required presample
observations.
A common practice is to fit a dynamic model to a portion of the data set, then
validate the predictability of the model by comparing its forecasts to observed
responses. During forecasting, the presample period contains the data to which the
model is fit, and the forecast period contains the holdout sample for validation.
Suppose that yt is an observed response
series; x1,t,
x2,t, and
x3,t are observed
exogenous series; and time t = 1,…,T. Consider
forecasting responses from a dynamic model of
yt containing a regression component
numperiods
= K periods. Suppose that the
dynamic model is fit to the data in the interval [1,T –
K] (for more details, see estimate
). This figure shows the time base partitions for
forecasting.
For example, to generate forecasts Y
from a regression model
with AR(2) errors, forecast
requires presample unconditional
disturbances U0
and future predictor data XF
.
forecast
infers unconditional disturbances
given enough readily available presample responses and predictor data.
To initialize an AR(2) error model, Y0
= and X0
= .
To model, forecast
requires future exogenous
data XF
= .
This figure shows the arrays of required observations for the general case, with corresponding input and output arguments.
forecast
computes the forecasted response MSEs,
YMSE
, by treating the predictor data matrices
(X0
and XF
) as nonstochastic and
statistically independent of the model innovations. Therefore,
YMSE
reflects the variance associated with the
unconditional disturbances of the ARIMA error model alone.
forecast
uses Y0
and
X0
to infer U0
. Therefore, if you
specify U0
, forecast
ignores
Y0
and X0
.
[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
[2] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004.
[3] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995.
[4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
[5] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, Inc., 1991.
[6] Tsay, R. S. Analysis of Financial Time Series. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.
You have a modified version of this example. Do you want to open this example with your edits?