Let $t_0 = 10$, then the forecast for $X_{12}$ would be $\hat{X}_{10} (2)$
Once the model is fit, the forecast will be calculated in steps, iteratively forecasting each forecast point.
Recall that $\bar{x}$ is the estimate for $\mu$ of an ARMA process.
Then the forecast for an ARMA(p,q) is described by
$$ \hat{X}_{t_0} \left( {l} \right) = \sum_{i=1}^p \phi_i \hat{X}_{t_0} \left( l - i \right) - \sum_{j=1}^q \theta_j \hat{a}_{t_0 + l - j} + \bar{x} \left[ 1 - \sum_{i = 1}^p \phi_i \right] + \hat{a}_t(l) $$Given $\hat{X}_{t_0}$, an estimate of $X_{t_0 + l}$ based on data up to $t_0$, limits on the forecast are constructed such that the probability that $X_{t_0 + l}$ will fall in the limits is 95%.
The forecast probability limits for an ARMA process is given by
$$ Forecast \,\, Interval: \,\, \hat{X}_{t_0} \left( {l} \right) \pm z_{1-\alpha / 2} \hat{\sigma}_a \left[ \sum_{i = 0}^{l-1} \psi_i^2 \right]^\frac{1}{2} $$where $\psi_i$ are the weights of the ARMA model expressed as a general linear process.
Recall that since the an ARMA process is stationary, the expected value is the mean. Then, intuitively, as the time steps of the forecast grow large, the forecast will regress to the mean i.e. as $l \rightarrow \infty$, $\hat{X}_{t_0} \left( {l} \right) \rightarrow \mu$.
From the general equation above, the forecast for an ARMA(1,0) is described by
$$ \hat{x}_{t_0} (l) = \phi_1 \hat{x}_{t_0} (l-1) + \bar{x} (1 - \phi_1) + \hat{a}_t(l) $$Let $\hat{a}_t(l)$ since the underlying process is random, then the forecast $l$ can be calculated
$$ \hat{x}_{t_0} (l) = \phi_1 \hat{x}_{t_0} (l-1) + \bar{x} (1 - \phi_1) $$Let $t_0 = 10$ and assume we want to forecast $t_{10}$, the forecast steps are
$$ \hat{x}_{t_{10}} (1) = \phi_1 \hat{x}_{t_{10}} (0) + \bar{x} (1 - \phi_1) $$$$ \hat{x}_{t_{10}} (2) = \phi_1 \hat{x}_{t_{10}} (1) + \bar{x} (1 - \phi_1) $$where $\hat{x}_{t_{10}} = X_{10}$
From the general equation above, the forecast for an ARMA(1,0) is described by
$$ \hat{x}_{t_0} (l) = \phi_1 \hat{x}_{t_0} (l-1) + \phi_2 \hat{x}_{t_0} (l-2) + \bar{x} (1 - \phi_1 - \phi_2) + \hat{a}_t(l) $$Using the following realization data and model provide a forecast for $\hat{x}_{75} (1)$, $\hat{x}_{75} (2)$, and $\hat{x}_{75} (3)$.
Model
$$ X_t = 1.6 X_t - 0.8 X_t + 30 (1 - 1.6 + 0.8) $$Realization Data
Calculation for $\hat{x}_{75} (1)$
$$ \hat{x}_{75} (1) = 1.6 \hat{x}_{75} (0) - 0.8 \hat{x}_{75} (-1) + \bar{x} (1 - 1.6 + 0.8) $$$$ \hat{x}_{75} (1) = 1.6 (23.4) - 0.8 (27.7) + 29.5 (1 - 1.6 + 0.8) = 21.2 $$Calculation for $\hat{x}_{75} (2)$
$$ \hat{x}_{75} (2) = 1.6 \hat{x}_{75} (1) - 0.8 \hat{x}_{75} (0) + \bar{x} (1 - 1.6 + 0.8) $$$$ \hat{x}_{75} (2) = 1.6 (21.2) - 0.8 (23.4) + 29.5 (1 - 1.6 + 0.8) = 21.1 $$Calculation for $\hat{x}_{75} (3)$
$$ \hat{x}_{75} (3) = 1.6 \hat{x}_{75} (2) - 0.8 \hat{x}_{75} (1) + \bar{x} (1 - 1.6 + 0.8) $$$$ \hat{x}_{75} (3) = 1.6 (21.1) - 0.8 (21.2) + 29.5 (1 - 1.6 + 0.8) = 22.7 $$Model forecasts are often validated by only using the first section of the data for model training and using a later section to calculate residuals. Mean square error of the residuals can be used to compare models.
The forecast horizon is set back $k$ steps, then, using the remaining $n-k$ time steps, the model is used to forecast on the withheld known future $k$ time steps.
Consider the following ARIMA model
$$ (1-B) \left( X_t - \mu \right) = a_t $$Then
$$ \hat{X}_{t_0} (l) = \hat{X}_{t_0} (l - 1) + \bar{X} ( 1 - 1 ) $$Notice that the mean in the equation above is multiplied by 0 due to the root on the unit circle.
The forecast depends on the order of the ARIMA.
Since the model is not stationary, the forecast limits are unbounded as $l$ increases.
The forecast for a seasonal model forecasts the value at $l$ to be equal to the value at $l-s$ where $s$ is the seasonality.
Seasonal models of the following form are typically known as "Airline Models."
$$ (1-B) \left( 1- B^s \right) X_t = a_t $$The addition of the $(1-B)$ with the seasonality term preserves trends in the data.
A signal plus noise model is expressed in the following form:
$$ X_t = \lambda (t) + Z_t + a_t $$where $\lambda$ is a deterministic function and $Z_t$ represents some noise model (generally modeled by an ARMA model).
Typically, the following strategy is used to forecast with a deterministic model:
Then, the forecast is given by
$$ \hat{X}_{t_0} (l) = \lambda \left( t_0 + l \right) + \hat{Z}_{t_0} (l) $$where $\lambda$ represents the deterministic function.
For a linear model,
Then, the forecasts for a linear trend are
$$ \hat{X}_{t_0} (l) = \hat{\beta}_0 + \hat{\beta}_1 t + \hat{Z}_{t_0} (l) $$And the forecast limits are given by
$$ Forecast \,\, Interval: \,\, \hat{\beta}_0 + \hat{\beta}_1 t + \hat{Z}_{t_0} (l) \pm z_{1-\alpha/2} \hat{\sigma}_a \left[ \sum_{k = 0}^{l-1} \psi_k^2 \right]^\frac{1}{2} $$where $\hat{\sigma}_a$ and $\psi_k$ are based on the ARMA fit to $\hat{Z}_t$