Class: dssm
Forecast states and observations of diffuse state-space models
[
returns forecasted observations (Y
,YMSE
]
= forecast(Mdl
,numPeriods
,Y0
)Y
)
and their corresponding variances (YMSE
) from forecasting
the diffuse
state-space model Mdl
using a numPeriods
forecast
horizon and in-sample observations Y0
.
[
uses
additional options specified by one or more Y
,YMSE
]
= forecast(Mdl
,numPeriods
,Y0
,Name,Value
)Name,Value
pair
arguments. For example, for state-space models that include a linear
regression component in the observation model, include in-sample predictor
data, predictor data for the forecast horizon, and the regression
coefficient.
Mdl
— Diffuse state-space modeldssm
model objectDiffuse state-space model, specified as an dssm
model
object returned by dssm
or estimate
.
If Mdl
is not fully specified (that is, Mdl
contains
unknown parameters), then specify values for the unknown parameters
using the '
Params
'
name-value
pair argument. Otherwise, the software issues an error. estimate
returns
fully-specified state-space models.
Mdl
does not store observed responses or
predictor data. Supply the data wherever necessary using the appropriate
input or name-value pair arguments.
numPeriods
— Forecast horizonForecast horizon, specified as a positive integer. That is,
the software returns 1,..,numPeriods
forecasts.
Data Types: double
Y0
— In-sample, observed responsesIn-sample, observed responses, specified as a cell vector of numeric vectors or a matrix.
If Mdl
is time invariant, then Y0
is
a T-by-n numeric matrix, where
each row corresponds to a period and each column corresponds to a
particular observation in the model. Therefore, T is
the sample size and m is the number of observations
per period. The last row of Y
contains the latest
observations.
If Mdl
is time varying with respect
to the observation equation, then Y
is a T-by-1
cell vector. Each element of the cell vector corresponds to a period
and contains an nt-dimensional
vector of observations for that period. The corresponding dimensions
of the coefficient matrices in Mdl.C{t}
and Mdl.D{t}
must
be consistent with the matrix in Y{t}
for all periods.
The last cell of Y
contains the latest observations.
If Mdl
is an estimated state-space
model (that is, returned by estimate
), then it is best practice to set Y0
to
the same data set that you used to fit Mdl
.
NaN
elements indicate missing
observations. For details on how the Kalman filter accommodates missing
observations, see Algorithms.
Data Types: double
| cell
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'Beta',beta,'Predictors',Z
specifies
to deflate the observations by the regression component composed of
the predictor data Z
and the coefficient matrix beta
.'A'
— Forecast-horizon, state-transition, coefficient matricesForecast-horizon, state-transition, coefficient matrices, specified
as the comma-separated pair consisting of 'A'
and
a cell vector of numeric matrices.
A
must contain at least numPeriods
cells.
Each cell must contain a matrix specifying how the states transition
in the forecast horizon. If the length of A
is
greater than numPeriods
, then the software uses
the first numPeriods
cells. The last cell indicates
the latest period in the forecast horizon.
If Mdl
is time invariant with respect
to the states, then each cell of A
must contain
an m-by-m matrix, where m is
the number of the in-sample states per period. By default, the software
uses Mdl.A
throughout the forecast horizon.
If Mdl
is time varying with respect
to the states, then the dimensions of the matrices in the cells of A
can
vary, but the dimensions of each matrix must be consistent with the
matrices in B
and C
in the
corresponding periods. By default, the software uses Mdl.A{end}
throughout
the forecast horizon.
Note
The matrices in A
cannot contain NaN
values.
Data Types: cell
'B'
— Forecast-horizon, state-disturbance-loading, coefficient matricesForecast-horizon, state-disturbance-loading, coefficient matrices,
specified as the comma-separated pair consisting of 'B'
and
a cell vector of matrices.
B
must contain at least numPeriods
cells.
Each cell must contain a matrix specifying how the states transition
in the forecast horizon. If the length of B
is
greater than numPeriods
, then the software uses
the first numPeriods
cells. The last cell indicates
the latest period in the forecast horizon.
If Mdl
is time invariant with respect
to the states and state disturbances, then each cell of B
must
contain an m-by-k matrix, where m is
the number of the in-sample states per period, and k is
the number of in-sample, state disturbances per period. By default,
the software uses Mdl.B
throughout the forecast
horizon.
If Mdl
is time varying, then the
dimensions of the matrices in the cells of B
can
vary, but the dimensions of each matrix must be consistent with the
matrices in A
in the corresponding periods. By
default, the software uses Mdl.B{end}
throughout
the forecast horizon.
Note
The matrices in B
cannot contain NaN
values.
Data Types: cell
'C'
— Forecast-horizon, measurement-sensitivity, coefficient matricesForecast-horizon, measurement-sensitivity, coefficient matrices,
specified as the comma-separated pair consisting of 'C'
and
a cell vector of matrices.
C
must contain at least numPeriods
cells.
Each cell must contain a matrix specifying how the states transition
in the forecast horizon. If the length of C
is
greater than numPeriods
, then the software uses
the first numPeriods
cells. The last cell indicates
the latest period in the forecast horizon.
If Mdl
is time invariant with respect
to the states and the observations, then each cell of C
must
contain an n-by-m matrix, where n is
the number of the in-sample observations per period, and m is
the number of in-sample states per period. By default, the software
uses Mdl.C
throughout the forecast horizon.
If Mdl
is time varying with respect
to the states or the observations, then the dimensions of the matrices
in the cells of C
can vary, but the dimensions
of each matrix must be consistent with the matrices in A
and D
in
the corresponding periods. By default, the software uses Mdl.C{end}
throughout
the forecast horizon.
Note
The matrices in C
cannot contain NaN
values.
Data Types: cell
'D'
— Forecast-horizon, observation-innovation, coefficient matricesForecast-horizon, observation-innovation, coefficient matrices,
specified as the comma-separated pair consisting of 'D'
and
a cell vector of matrices.
D
must contain at least numPeriods
cells.
Each cell must contain a matrix specifying how the states transition
in the forecast horizon. If the length of D
is
greater than numPeriods
, then the software uses
the first numPeriods
cells. The last cell indicates
the latest period in the forecast horizon.
If Mdl
is time invariant with respect
to the observations and the observation innovations, then each cell
of D
must contain an n-by-h matrix,
where n is the number of the in-sample observations
per period, and h is the number of in-sample, observation
innovations per period. By default, the software uses Mdl.D
throughout
the forecast horizon.
If Mdl
is time varying with respect
to the observations or the observation innovations, then the dimensions
of the matrices in the cells of D
can vary, but
the dimensions of each matrix must be consistent with the matrices
in C
in the corresponding periods. By default,
the software uses Mdl.D{end}
throughout the forecast
horizon.
Note
The matrices in D
cannot contain NaN
values.
Data Types: cell
'Beta'
— Regression coefficients[]
(default) | numeric matrixRegression coefficients corresponding to predictor variables,
specified as the comma-separated pair consisting of 'Beta'
and
a d-by-n numeric matrix. d is
the number of predictor variables (see Predictors0
and PredictorsF
)
and n is the number of observed response series
(see Y0
).
If you specify Beta
, then you must
also specify Predictors0
and PredictorsF
.
If Mdl
is an estimated state-space
model, then specify the estimated regression coefficients stored in Mdl.estParams
.
By default, the software excludes a regression component from the state-space model.
'Predictors0'
— In-sample, predictor variables in state-space model observation equation[]
(default) | matrixIn-sample, predictor variables in the state-space model observation
equation, specified as the comma-separated pair consisting of 'Predictors0'
and
a matrix. The columns of Predictors0
correspond
to individual predictor variables. Predictors0
must
have T rows, where row t corresponds
to the observed predictors at period t (Zt).
The expanded observation equation is
In other words, the software deflates the observations using the regression component. β is the time-invariant vector of regression coefficients that the software estimates with all other parameters.
If there are n observations per period, then the software regresses all predictor series onto each observation.
If you specify Predictors0
, then Mdl
must
be time invariant. Otherwise, the software returns an error.
If you specify Predictors0
, then
you must also specify Beta
and PredictorsF
.
If Mdl
is an estimated state-space
model (that is, returned by estimate
), then it is best practice to set Predictors0
to
the same predictor data set that you used to fit Mdl
.
By default, the software excludes a regression component from the state-space model.
Data Types: double
'PredictorsF'
— Forecast-horizon, predictor variables in state-space model observation equation[]
(default) | numeric matrixIn-sample, predictor variables in the state-space model observation
equation, specified as the comma-separated pair consisting of 'Predictors0'
and
a T-by-d numeric matrix. T is
the number of in-sample periods and d is the number
of predictor variables. Row t corresponds to the
observed predictors at period t (Zt).
The expanded observation equation is
In other words, the software deflates the observations using the regression component. β is the time-invariant vector of regression coefficients that the software estimates with all other parameters.
If there are n observations per period, then the software regresses all predictor series onto each observation.
If you specify Predictors0
, then Mdl
must
be time invariant. Otherwise, the software returns an error.
If you specify Predictors0
, then
you must also specify Beta
and PredictorsF
.
If Mdl
is an estimated state-space
model (that is, returned by estimate
), then it
is best practice to set Predictors0
to the same
predictor data set that you used to fit Mdl
.
By default, the software excludes a regression component from the state-space model.
Data Types: double
Y
— Forecasted observationsForecasted observations, returned as a matrix or a cell vector of numeric vectors.
If Mdl
is a time-invariant, state-space model
with respect to the observations, then Y
is a numPeriods
-by-n matrix.
If Mdl
is a time-varying, state-space model
with respect to the observations, then Y
is a numPeriods
-by-1
cell vector of numeric vectors. Cell t of Y
contains
an nt-by-1 numeric vector
of forecasted observations for period t.
YMSE
— Error variances of forecasted observationsError variances of forecasted observations, returned as a matrix or a cell vector of numeric vectors.
If Mdl
is a time-invariant, state-space model
with respect to the observations, then YMSE
is
a numPeriods
-by-n matrix.
If Mdl
is a time-varying, state-space model
with respect to the observations, then YMSE
is
a numPeriods
-by-1 cell vector of numeric vectors.
Cell t of YMSE
contains an nt-by-1
numeric vector of error variances for the corresponding forecasted
observations for period t.
X
— State forecastsState forecasts, returned as a matrix or a cell vector of numeric vectors.
If Mdl
is a time-invariant, state-space model
with respect to the states, then X
is a numPeriods
-by-m matrix.
If Mdl
is a time-varying, state-space model
with respect to the states, then X
is a numPeriods
-by-1
cell vector of numeric vectors. Cell t of X
contains
an mt-by-1 numeric vector
of forecasted observations for period t.
XMSE
— Error variances of state forecastsError variances of state forecasts, returned as a matrix or a cell vector of numeric vectors.
If Mdl
is a time-invariant, state-space model
with respect to the states, then XMSE
is a numPeriods
-by-m matrix.
If Mdl
is a time-varying, state-space model
with respect to the states, then XMSE
is a numPeriods
-by-1
cell vector of numeric vectors. Cell t of XMSE
contains
an mt-by-1 numeric vector
of error variances for the corresponding forecasted observations for
period t.
Suppose that a latent process is a random walk. The state equation is
where is Gaussian with mean 0 and standard deviation 1.
Generate a random series of 100 observations from , assuming that the series starts at 1.5.
T = 100;
x0 = 1.5;
rng(1); % For reproducibility
u = randn(T,1);
x = cumsum([x0;u]);
x = x(2:end);
Suppose further that the latent process is subject to additive measurement error. The observation equation is
where is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a state-space model.
Use the random latent state process (x
) and the observation equation to generate observations.
y = x + 0.75*randn(T,1);
Specify the four coefficient matrices.
A = 1; B = 1; C = 1; D = 0.75;
Create the diffuse state-space model using the coefficient matrices. Specify that the initial state distribution is diffuse.
Mdl = dssm(A,B,C,D,'StateType',2)
Mdl = State-space model type: dssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = x1(t-1) + u1(t) Observation equation: y1(t) = x1(t) + (0.75)e1(t) Initial state distribution: Initial state means x1 0 Initial state covariance matrix x1 x1 Inf State types x1 Diffuse
Mdl
is an dssm
model. Verify that the model is correctly specified using the display in the Command Window.
Forecast observations 10 periods into the future, and estimate the mean squared errors of the forecasts.
numPeriods = 10; [ForecastedY,YMSE] = forecast(Mdl,numPeriods,y);
Plot the forecasts with the in-sample responses, and 95% Wald-type forecast intervals.
ForecastIntervals(:,1) = ForecastedY - 1.96*sqrt(YMSE); ForecastIntervals(:,2) = ForecastedY + 1.96*sqrt(YMSE); figure plot(T-20:T,y(T-20:T),'-k',T+1:T+numPeriods,ForecastedY,'-.r',... T+1:T+numPeriods,ForecastIntervals,'-.b',... T:T+1,[y(end)*ones(3,1),[ForecastedY(1);ForecastIntervals(1,:)']],':k',... 'LineWidth',2) hold on title({'Observed Responses and Their Forecasts'}) xlabel('Period') ylabel('Responses') legend({'Observations','Forecasted observations','95% forecast intervals'},... 'Location','Best') hold off
The forecast intervals flare out because the process is nonstationary.
Suppose that the linear relationship between unemployment rate and the nominal gross national product (nGNP) is of interest. Suppose further that unemployment rate is an AR(1) series. Symbolically, and in state-space form, the model is
where:
is the unemployment rate at time t.
is the observed change in the unemployment rate being deflated by the return of nGNP ().
is the Gaussian series of state disturbances having mean 0 and unknown standard deviation .
Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other things.
load Data_NelsonPlosser
Preprocess the data by taking the natural logarithm of the nGNP series and removing the starting NaN
values from each series.
isNaN = any(ismissing(DataTable),2); % Flag periods containing NaNs gnpn = DataTable.GNPN(~isNaN); y = diff(DataTable.UR(~isNaN)); T = size(gnpn,1); % The sample size Z = price2ret(gnpn);
This example continues using the series without NaN
values. However, using the Kalman filter framework, the software can accommodate series containing missing values.
Determine how well the model forecasts observations by removing the last 10 observations for comparison.
numPeriods = 10; % Forecast horizon isY = y(1:end-numPeriods); % In-sample observations oosY = y(end-numPeriods+1:end); % Out-of-sample observations ISZ = Z(1:end-numPeriods); % In-sample predictors OOSZ = Z(end-numPeriods+1:end); % Out-of-sample predictors
Specify the coefficient matrices.
A = NaN; B = NaN; C = 1;
Create the state-space model using dssm
by supplying the coefficient matrices and specifying that the state values come from a diffuse distribution. The diffuse specification indicates complete ignorance about the moments of the initial distribution.
StateType = 2;
Mdl = dssm(A,B,C,'StateType',StateType);
Estimate the parameters. Specify the regression component and its initial value for optimization using the 'Predictors'
and 'Beta0'
name-value pair arguments, respectively. Display the estimates and all optimization diagnostic information. Restrict the estimate of to all positive, real numbers.
params0 = [0.3 0.2]; % Initial values chosen arbitrarily Beta0 = 0.1; [EstMdl,estParams] = estimate(Mdl,y,params0,'Predictors',Z,'Beta0',Beta0,... 'lb',[-Inf 0 -Inf]);
Method: Maximum likelihood (fmincon) Effective Sample size: 60 Logarithmic likelihood: -110.477 Akaike info criterion: 226.954 Bayesian info criterion: 233.287 | Coeff Std Err t Stat Prob -------------------------------------------------------- c(1) | 0.59436 0.09408 6.31738 0 c(2) | 1.52554 0.10758 14.17991 0 y <- z(1) | -24.26161 1.55730 -15.57930 0 | | Final State Std Dev t Stat Prob x(1) | 2.54764 0 Inf 0
EstMdl
is a dssm
model, and you can access its properties using dot notation.
Forecast observations over the forecast horizon. EstMdl
does not store the data set, so you must pass it in appropriate name-value pair arguments.
[fY,yMSE] = forecast(EstMdl,numPeriods,isY,'Predictors0',ISZ,... 'PredictorsF',OOSZ,'Beta',estParams(end));
fY
is a 10-by-1 vector containing the forecasted observations, and yMSE
is a 10-by-1 vector containing the variances of the forecasted observations.
Obtain 95% Wald-type forecast intervals. Plot the forecasted observations with their true values and the forecast intervals.
ForecastIntervals(:,1) = fY - 1.96*sqrt(yMSE); ForecastIntervals(:,2) = fY + 1.96*sqrt(yMSE); figure h = plot(dates(end-numPeriods-9:end-numPeriods),isY(end-9:end),'-k',... dates(end-numPeriods+1:end),oosY,'-k',... dates(end-numPeriods+1:end),fY,'--r',... dates(end-numPeriods+1:end),ForecastIntervals,':b',... dates(end-numPeriods:end-numPeriods+1),... [isY(end)*ones(4,1),[oosY(1);ForecastIntervals(1,:)';fY(1)]],':k',... 'LineWidth',2); xlabel('Period') ylabel('Change in unemployment rate') legend(h([1,3,4]),{'Observations','Forecasted responses',... '95% forecast intervals'}) title('Observed and Forecasted Changes in the Unemployment Rate')
Suppose that a latent process is a random walk. The state equation is
where is Gaussian with mean 0 and standard deviation 1.
Generate a random series of 100 observations from , assuming that the series starts at 1.5.
T = 100;
x0 = 1.5;
rng(1); % For reproducibility
u = randn(T,1);
x = cumsum([x0;u]);
x = x(2:end);
Suppose further that the latent process is subject to additive measurement error. The observation equation is
where is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a state-space model.
Use the random latent state process (x
) and the observation equation to generate observations.
y = x + 0.75*randn(T,1);
Specify the four coefficient matrices.
A = 1; B = 1; C = 1; D = 0.75;
Create the diffuse state-space model using the coefficient matrices. Specify that the initial state distribution is diffuse.
Mdl = dssm(A,B,C,D,'StateType',2)
Mdl = State-space model type: dssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = x1(t-1) + u1(t) Observation equation: y1(t) = x1(t) + (0.75)e1(t) Initial state distribution: Initial state means x1 0 Initial state covariance matrix x1 x1 Inf State types x1 Diffuse
Mdl
is an dssm
model. Verify that the model is correctly specified using the display in the Command Window.
Forecast states 10 periods into the future, and estimate the mean squared errors of the forecasts.
numPeriods = 10; [~,~,ForecastedX,XMSE] = forecast(Mdl,numPeriods,y);
Plot the forecasts with the in-sample states, and 95% Wald-type forecast intervals.
ForecastIntervals(:,1) = ForecastedX - 1.96*sqrt(XMSE); ForecastIntervals(:,2) = ForecastedX + 1.96*sqrt(XMSE); figure plot(T-20:T,x(T-20:T),'-k',T+1:T+numPeriods,ForecastedX,'-.r',... T+1:T+numPeriods,ForecastIntervals,'-.b',... T:T+1,[x(end)*ones(3,1),[ForecastedX(1);ForecastIntervals(1,:)']],':k',... 'LineWidth',2) hold on title({'State Values and Their Forecasts'}) xlabel('Period') ylabel('State value') legend({'State Values','Forecasted states','95% forecast intervals'},... 'Location','Best') hold off
The forecast intervals flare out because the process is nonstationary.
Mdl
does not store the response data, predictor
data, and the regression coefficients. Supply them whenever necessary
using the appropriate input or name-value pair arguments.
The Kalman filter accommodates missing data by not updating filtered state estimates corresponding to missing observations. In other words, suppose there is a missing observation at period t. Then, the state forecast for period t based on the previous t – 1 observations and filtered state for period t are equivalent.
[1] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
You have a modified version of this example. Do you want to open this example with your edits?