forecast

Forecast responses of regression model with ARIMA errors

Syntax

[Y,YMSE] = forecast(Mdl,numperiods) [Y,YMSE,U] = forecast(Mdl,numperiods) [Y,YMSE,U] = forecast(Mdl,numperiods,Name,Value)

Description

[Y,YMSE] = forecast(Mdl,numperiods) forecasts responses (Y) for a regression model with ARIMA time series errors and generates corresponding mean square errors (YMSE).

[Y,YMSE,U] = forecast(Mdl,numperiods) additionally forecasts unconditional disturbances for a regression model with ARIMA errors.

[Y,YMSE,U] = forecast(Mdl,numperiods,Name,Value) forecasts with additional options specified by one or more Name,Value pair arguments.

Input Arguments

expand all

`Mdl` — Regression model with ARIMA errors
`regARIMA` model

Regression model with ARIMA errors, specified as a regARIMA model returned by regARIMA or estimate.

The properties of Mdl cannot contain NaNs.

`numperiods` — Forecast horizon
positive integer

Forecast horizon, or the number of time points in the forecast period, specified as a positive integer.

Data Types: double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

`'E0'` — Presample innovations
numeric column vector | numeric matrix

Presample innovations that initialize the moving average (MA) component of the ARIMA error model, specified as the comma-separated pair consisting of 'E0' and a numeric column vector or numeric matrix. forecast assumes that the presample innovations have a mean of 0.

If E0 is a column vector, then forecast applies it to each forecasted path.
If E0, Y0, and U0 are matrices with multiple paths, then they must have the same number of columns.
E0 requires at least Mdl.Q rows. If E0 contains extra rows, then forecast uses the latest presample innovations. The last row contains the latest presample innovation.

By default, if U0 contains at least Mdl.P + Mdl.Q rows, then forecast infers E0 from U0. If U0 has an insufficient number of rows, and forecast cannot infer sufficient observations of U0 from the presample data (Y0 and X0), then E0 is 0.

Data Types: double

`'U0'` — Presample unconditional disturbances
numeric column vector | numeric matrix

Presample unconditional disturbances that initialize the autoregressive (AR) component of the ARIMA error model, specified as the comma-separated pair consisting of 'U0' and a numeric column vector or numeric matrix. If you do not specify presample innovations E0, forecast uses U0 to infer them.

If U0 is a column vector, then forecast applies it to each forecasted path.
If U0, Y0, and E0 are matrices with multiple paths, then they must have the same number of columns.
U0 requires at least Mdl.P rows. If U0 contains extra rows, then forecast uses the latest presample unconditional disturbances. The last row contains the latest presample unconditional disturbance.

By default, if the presample data (Y0 and X0) contains at least Mdl.P rows, then forecast infers U0 from the presample data. If you do not specify presample data, then all required presample unconditional disturbances are 0.

Data Types: double

`'X0'` — Presample predictor data
numeric matrix

Presample predictor data that initializes the model for forecasting, specified as the comma-separated pair consisting of 'X0' and a numeric matrix. The columns of X0 are separate time series variables. forecast uses X0 to infer presample unconditional disturbances U0. Therefore, if you specify U0, forecast ignores X0.

If you do not specify U0, then X0 requires at least Mdl.P rows to infer U0. If X0 contains extra rows, then forecast uses the latest observations. The last row contains the latest observation of each series.
X0 requires the same number of columns as the length of Mdl.Beta.
If you specify X0, then you must also specify XF.
forecast treats X0 as a fixed (nonstochastic) matrix.

Data Types: double

`'XF'` — Forecasted or future predictor data
numeric matrix

Forecasted or future predictor data, specified as the comma-separated pair consisting of 'XF' and a numeric matrix.

The columns of XF are separate time series, each corresponding to forecasts of the series in X0. Row t of XF contains the t-period-ahead forecasts of X0.

If you specify X0, then you must also specify XF. XF and X0 require the same number of columns. XF must have at least numperiods rows. If XF exceeds numperiods rows, then forecast uses the first numperiods forecasts.

forecast treats XF as a fixed (nonstochastic) matrix.

By default, forecast does not include a regression component in the model, regardless of the presence of regression coefficients in Mdl.

Data Types: double

`'Y0'` — Presample response data
numeric column vector | numeric matrix

Presample response data that initializes the model for forecasting, specified as the comma-separated pair consisting of 'Y0' and a numeric column vector or numeric matrix. forecast uses Y0 to infer presample unconditional disturbances U0. Therefore, if you specify U0, forecast ignores Y0.

If Y0 is a column vector, forecast applies it to each forecasted path.
If Y0, E0, and U0 are matrices with multiple paths, then they must have the same number of columns.
If you do not specify U0, then Y0 requires at least Mdl.P rows to infer U0. If Y0 contains extra rows, then forecast uses the latest observations. The last row contains the latest observation.

Data Types: double

Notes

NaNs in E0, U0, X0, XF, and Y0 indicate missing values and forecast removes them. The software merges the presample data sets (E0, U0, X0, and Y0), then uses list-wise deletion to remove any NaNs. forecast similarly removes NaNs from XF. Removing NaNs in the data reduces the sample size. Such removal can also create irregular time series.
forecast assumes that you synchronize presample data such that the latest observation of each presample series occurs simultaneously.
Set X0 to the same predictor matrix as X used in the estimation, simulation, or inference of Mdl. This assignment ensures correct inference of the unconditional disturbances, U0.
To include a regression component in the response forecast, you must specify the forecasted predictor data XF. That is, you can specify XF without also specifying X0, but forecast issues an error when you specify X0 without also specifying XF.

Output Arguments

expand all

`Y` — Minimum mean square error forecasts of response data
numeric matrix

Minimum mean square error (MMSE) forecasts of the response data, returned as a numeric matrix. Y has numperiods rows and numPaths columns.

If you do not specify Y0, E0, and U0, then Y is a numperiods column vector.
If you specify Y0, E0, and U0, all having numPaths columns, then Y is a numperiods-by-numPaths matrix.
Row i of Y contains the forecasts for the ith period.

Data Types: double

`YMSE` — Mean square errors of forecasted responses
numeric matrix

Mean square errors (MSEs) of the forecasted responses, returned as a numeric matrix. YMSE has numperiods rows and numPaths columns.

If you do not specify Y0, E0, and U0, then YMSE is a numperiods column vector.
If you specify Y0, E0, and U0, all having numPaths columns, then YMSE is a numperiods-by-numPaths matrix.
Row i of YMSE contains the forecast error variances for the ith period.
The predictor data does not contribute variability to YMSE because forecast treats XF as a nonstochastic matrix.
The square roots of YMSE are the standard errors of the forecasts of Y.

Data Types: double

`U` — Minimum mean square error forecasts of future ARIMA error model unconditional disturbances
numeric matrix

Minimum mean square error (MMSE) forecasts of future ARIMA error model unconditional disturbances, returned as a numeric matrix. U has numperiods rows and numPaths columns.

If you do not specify Y0, E0, and U0, then U is a numperiods column vector.
If you specify Y0, E0, and U0, all having numPaths columns, then U is a numperiods-by-numPaths matrix.
Row i of U contains the forecasted unconditional disturbances for the ith period.

Data Types: double

Examples

expand all

Forecast Responses of a Regression Model with ARIMA Errors

Open Live Script

Forecast responses from the following regression model with ARMA(2,1) errors over a 30-period horizon:

$\begin{array}{llllllllllllllllllll} \begin{array}{c} y_{t} = X_{t} [\begin{array}{cccccccccccccccccccc} 0.1 \\ - 0.2 \end{array}] + u_{t} \\ u_{t} = 0.5 u_{t - 1} - 0.8 u_{t - 2} + ε_{t} - 0.5 ε_{t - 1}, \end{array} \end{array}$

where $ε_{t}$ is Gaussian with variance 0.1.

Specify the model. Simulate responses from the model and two predictor series.

Mdl0 = regARIMA('Intercept',0,'AR',{0.5 -0.8},...
    'MA',-0.5,'Beta',[0.1 -0.2],'Variance',0.1);
rng(1); % For reproducibility
X =  randn(130,2);
y = simulate(Mdl0,130,'X',X);

Fit the model to the first 100 observations, and reserve the remaining 30 observations to evaluate forecast performance.

Mdl = regARIMA('ARLags',1:2);
EstMdl = estimate(Mdl,y(1:100),'X',X(1:100,:));

 
    Regression with ARMA(2,0) Error Model (Gaussian Distribution):
 
                  Value      StandardError    TStatistic      PValue  
                 ________    _____________    __________    __________

    Intercept    0.004358      0.021314        0.20446         0.83799
    AR{1}         0.36833      0.067103         5.4891      4.0408e-08
    AR{2}        -0.75063      0.090865        -8.2609      1.4453e-16
    Beta(1)      0.076398      0.023008         3.3205      0.00089863
    Beta(2)       -0.1396      0.023298        -5.9919      2.0741e-09
    Variance     0.079876       0.01342         5.9522      2.6453e-09

EstMdl is a new regARIMA model containing the estimates. The estimates are close to their true values.

Use EstMdl to forecast a 30-period horizon. Visually compare the forecasts to the holdout data using a plot.

[yF,yMSE] = forecast(EstMdl,30,'Y0',y(1:100),...
    'X0',X(1:100,:),'XF',X(101:end,:));

figure
plot(y,'Color',[.7,.7,.7]);
hold on
plot(101:130,yF,'b','LineWidth',2);
plot(101:130,yF+1.96*sqrt(yMSE),'r:',...
		'LineWidth',2);
plot(101:130,yF-1.96*sqrt(yMSE),'r:','LineWidth',2);
h = gca;
ph = patch([repmat(101,1,2) repmat(130,1,2)],...
        [h.YLim fliplr(h.YLim)],...
        [0 0 0 0],'b');
ph.FaceAlpha = 0.1;
legend('Observed','Forecast',...
		'95% Forecast Interval','Location','Best');
title(['30-Period Forecasts and Approximate 95% '...
			'Forecast Intervals'])
axis tight
hold off

Many observations in the holdout sample fall beyond the 95% forecast intervals. Two reasons for this are:

The predictors are randomly generated in this example. estimate treats the predictors as fixed. The 95% forecast intervals based on the estimates from estimate do not account for the variability in the predictors.
By shear chance, the estimation period seems less volatile than the forecast period. estimate uses the less volatile estimation period data to estimate the parameters. Therefore, forecast intervals based on the estimates should not cover observations that have an underlying innovations process with larger variability.

Forecast the GDP Using Regression Model with ARMA Errors

Open Live Script

Forecast stationary, log GDP using a regression model with ARMA(1,1) errors, including CPI as a predictor.

Load the U.S. macroeconomic data set and preprocess the data.

load Data_USEconModel;
logGDP = log(DataTable.GDP);
dlogGDP = diff(logGDP);          % For stationarity
dCPI = diff(DataTable.CPIAUCSL); % For stationarity
numObs = length(dlogGDP);
gdp = dlogGDP(1:end-15);   % Estimation sample
cpi = dCPI(1:end-15);
T = length(gdp);        % Effective sample size
frstHzn =  T+1:numObs;  % Forecast horizon
hoCPI = dCPI(frstHzn);  % Holdout sample
dts = dates(2:end);     % Date nummbers

Fit a regression model with ARMA(1,1) errors.

Mdl = regARIMA('ARLags',1,'MALags',1);
EstMdl = estimate(Mdl,gdp,'X',cpi);

 
    Regression with ARMA(1,1) Error Model (Gaussian Distribution):
 
                   Value       StandardError    TStatistic      PValue  
                 __________    _____________    __________    __________

    Intercept      0.014793      0.0016289        9.0818      1.0684e-19
    AR{1}           0.57601        0.10009        5.7548      8.6754e-09
    MA{1}          -0.15258        0.11978       -1.2738         0.20272
    Beta(1)       0.0028972      0.0013989         2.071        0.038355
    Variance     9.5734e-05     6.5562e-06        14.602       2.723e-48

Forecast the GDP rate over a 15-quarter horizon. Use the estimation sample as a presample for the forecast.

[gdpF,gdpMSE] = forecast(EstMdl,15,'Y0',gdp,...
    'X0',cpi,'XF',hoCPI);

Plot the forecasts and 95% forecast intervals.

figure
h1 = plot(dts(end-65:end),dlogGDP(end-65:end),...
    'Color',[.7,.7,.7]);
datetick
hold on
h2 = plot(dts(frstHzn),gdpF,'b','LineWidth',2);
h3 = plot(dts(frstHzn),gdpF+1.96*sqrt(gdpMSE),'r:',...
		'LineWidth',2);
plot(dts(frstHzn),gdpF-1.96*sqrt(gdpMSE),'r:','LineWidth',2);
ha = gca;
title(['{\bf Forecasts and Approximate 95% }'...
    '{\bf Forecast Intervals for GDP rate}']);
ph = patch([repmat(dts(frstHzn(1)),1,2) repmat(dts(frstHzn(end)),1,2)],...
    [ha.YLim fliplr(ha.YLim)],...
    [0 0 0 0],'b');
ph.FaceAlpha = 0.1;
legend([h1 h2 h3],{'Observed GDP rate','Forecasted GDP rate ',...
    '95% Forecast Interval'},'Location','Best','AutoUpdate','off');
axis tight
hold off

Forecast Using a Regression Model with ARIMA Errors and a Known Intercept

Open Live Script

Forecast unit root nonstationary, log GDP using a regression model with ARIMA(1,1,1) errors, including CPI as a predictor and a known intercept.

Load the U.S. Macroeconomic data set and preprocess the data.

load Data_USEconModel;
numObs = length(DataTable.GDP);
logGDP = log(DataTable.GDP(1:end-15));
cpi = DataTable.CPIAUCSL(1:end-15);
T = length(logGDP);                  % Effective sample size
frstHzn =  T+1:numObs;               % Forecast horizon
hoCPI = DataTable.CPIAUCSL(frstHzn); % Holdout sample

Specify the model for the estimation period.

Mdl = regARIMA('ARLags',1,'MALags',1,'D',1);

The intercept is not identifiable in a model with integrated errors, so fix its value before estimation. One way to do this is to estimate the intercept using simple linear regression.

Reg4Int = [ones(T,1), cpi]\logGDP;
intercept = Reg4Int(1);

Consider performing a sensitivity analysis by using a grid of intercepts.

Set the intercept and fit the regression model with ARIMA(1,1,1) errors.

Mdl.Intercept = intercept;
EstMdl = estimate(Mdl,logGDP,'X',cpi,...
    'Display','off')

EstMdl = 
  regARIMA with properties:

     Description: "ARIMA(1,1,1) Error Model (Gaussian Distribution)"
    Distribution: Name = "Gaussian"
       Intercept: 5.80142
            Beta: [0.00396704]
               P: 2
               D: 1
               Q: 1
              AR: {0.922717} at lag [1]
             SAR: {}
              MA: {-0.387864} at lag [1]
             SMA: {}
        Variance: 0.000108944
 
   Regression with ARIMA(1,1,1) Error Model (Gaussian Distribution)

Forecast GDP over a 15-quarter horizon. Use the estimation sample as a presample for the forecast.

[gdpF,gdpMSE] = forecast(EstMdl,15,'Y0',logGDP,...
    'X0',cpi,'XF',hoCPI);

Plot the forecasts and 95% forecast intervals.

figure
h1 = plot(dates(end-65:end),log(DataTable.GDP(end-65:end)),...
    'Color',[.7,.7,.7]);
datetick
hold on
h2 = plot(dates(frstHzn),gdpF,'b','LineWidth',2);
h3 = plot(dates(frstHzn),gdpF+1.96*sqrt(gdpMSE),'r:',...
		'LineWidth',2);
plot(dates(frstHzn),gdpF-1.96*sqrt(gdpMSE),'r:',...
    'LineWidth',2);
ha = gca;

title(['{\bf Forecasts and Approximate 95% }'...
			'{\bf Forecast Intervals for log GDP}']);
ph = patch([repmat(dates(frstHzn(1)),1,2) repmat(dates(frstHzn(end)),1,2)],...
        [ha.YLim fliplr(ha.YLim)],...
        [0 0 0 0],'b');
ph.FaceAlpha = 0.1;
legend([h1 h2 h3],{'Observed GDP','Forecasted GDP',...
		'95% Forecast Interval'},'Location','Best','AutoUpdate','off');
axis tight
hold off

The unconditional disturbances, $u_{t}$ , are nonstationary, therefore the widths of the forecast intervals grow with time.

More About

expand all

Time Base Partitions for Forecasting

Time base partitions for forecasting are two disjoint, contiguous intervals of the time base; each interval contains time series data for forecasting a dynamic model. The forecast period (forecast horizon) is a numperiods length partition at the end of the time base during which forecast generates forecasts Y from the dynamic model Mdl. The presample period is the entire partition occurring before the forecast period. forecast can require observed responses Y0, regression data X0, unconditional disturbances U0, or innovations E0 in the presample period to initialize the dynamic model for forecasting. The model structure determines the types and amounts of required presample observations.

A common practice is to fit a dynamic model to a portion of the data set, then validate the predictability of the model by comparing its forecasts to observed responses. During forecasting, the presample period contains the data to which the model is fit, and the forecast period contains the holdout sample for validation. Suppose that y_t is an observed response series; x_1,t, x_2,t, and x_3,t are observed exogenous series; and time t = 1,…,T. Consider forecasting responses from a dynamic model of y_t containing a regression component numperiods = K periods. Suppose that the dynamic model is fit to the data in the interval [1,T – K] (for more details, see estimate). This figure shows the time base partitions for forecasting.

For example, to generate forecasts Y from a regression model with AR(2) errors, forecast requires presample unconditional disturbances U0 and future predictor data XF.

forecast infers unconditional disturbances given enough readily available presample responses and predictor data. To initialize an AR(2) error model, Y0 = ${[\begin{matrix} y_{T - K - 1} & y_{T - K} \end{matrix}]}^{'}$ and X0 = $[\begin{matrix} x_{1, T - K - 1} & x_{2, T - K - 1} & x_{3, T - K - 1} \\ x_{1, T - K - 1} & x_{2, T - K} & x_{3, T - K} \end{matrix}]$ .
To model, forecast requires future exogenous data XF = $[\begin{matrix} x_{1, (T - K + 1) : T} & x_{2, (T - K + 1) : T} & x_{3, (T - K + 1) : T} \end{matrix}]$ .

This figure shows the arrays of required observations for the general case, with corresponding input and output arguments.

Algorithms

forecast computes the forecasted response MSEs, YMSE, by treating the predictor data matrices (X0 and XF) as nonstochastic and statistically independent of the model innovations. Therefore, YMSE reflects the variance associated with the unconditional disturbances of the ARIMA error model alone.
forecast uses Y0 and X0 to infer U0. Therefore, if you specify U0, forecast ignores Y0 and X0.

References

[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.

[2] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004.

[3] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995.

[4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.

[5] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, Inc., 1991.

[6] Tsay, R. S. Analysis of Financial Time Series. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.

Documentation

forecast

Syntax

Description

Input Arguments

`Mdl` — Regression model with ARIMA errors
`regARIMA` model

`numperiods` — Forecast horizon
positive integer

Name-Value Pair Arguments

`'E0'` — Presample innovations
numeric column vector | numeric matrix

`'U0'` — Presample unconditional disturbances
numeric column vector | numeric matrix

`'X0'` — Presample predictor data
numeric matrix

`'XF'` — Forecasted or future predictor data
numeric matrix

`'Y0'` — Presample response data
numeric column vector | numeric matrix

Output Arguments

`Y` — Minimum mean square error forecasts of response data
numeric matrix

`YMSE` — Mean square errors of forecasted responses
numeric matrix

`U` — Minimum mean square error forecasts of future ARIMA error model unconditional disturbances
numeric matrix

Examples

Forecast Responses of a Regression Model with ARIMA Errors

Forecast the GDP Using Regression Model with ARMA Errors

Forecast Using a Regression Model with ARIMA Errors and a Known Intercept

More About

Time Base Partitions for Forecasting

Algorithms

References

See Also

Topics

Econometrics Toolbox Documentation

Support

Documentation

forecast

Syntax

Description

Input Arguments

Mdl — Regression model with ARIMA errors regARIMA model

numperiods — Forecast horizon positive integer

Name-Value Pair Arguments

'E0' — Presample innovations numeric column vector | numeric matrix

'U0' — Presample unconditional disturbances numeric column vector | numeric matrix

'X0' — Presample predictor data numeric matrix

'XF' — Forecasted or future predictor data numeric matrix

'Y0' — Presample response data numeric column vector | numeric matrix

Output Arguments

Y — Minimum mean square error forecasts of response data numeric matrix

YMSE — Mean square errors of forecasted responses numeric matrix

U — Minimum mean square error forecasts of future ARIMA error model unconditional disturbances numeric matrix

Examples

Forecast Responses of a Regression Model with ARIMA Errors

Forecast the GDP Using Regression Model with ARMA Errors

Forecast Using a Regression Model with ARIMA Errors and a Known Intercept

More About

Time Base Partitions for Forecasting

Algorithms

References

See Also

Topics

Econometrics Toolbox Documentation

Support

`Mdl` — Regression model with ARIMA errors
`regARIMA` model

`numperiods` — Forecast horizon
positive integer

`'E0'` — Presample innovations
numeric column vector | numeric matrix

`'U0'` — Presample unconditional disturbances
numeric column vector | numeric matrix

`'X0'` — Presample predictor data
numeric matrix

`'XF'` — Forecasted or future predictor data
numeric matrix

`'Y0'` — Presample response data
numeric column vector | numeric matrix

`Y` — Minimum mean square error forecasts of response data
numeric matrix

`YMSE` — Mean square errors of forecasted responses
numeric matrix

`U` — Minimum mean square error forecasts of future ARIMA error model unconditional disturbances
numeric matrix