Linked-In Q&A: ARIMA Simplified

Again, I find myself drawn-in to a Linked-In question that forces me to clearly articulate my understanding:

Question: Could some one explain the ARIMA forecasting model in a simplified manner?

My Answer:

An autoregressive integrated moving average (ARIMA) models are best understood in the context of Box–Jenkins methodology, who’s steps are as follows:

  1. Model identification and model selection.
  2. Parameter estimation.
  3. Model checking.

Box–Jenkins methodology applies autoregressive moving average (ARMA) or ARIMA models to find the best fit of a time series to past values of a time series, in order to make forecasts and understand the underlying process/es. Briefly put, the model selected is the simplest one that accounts for the properties of the time series, then the parameters for the specific model are chosen for best fit to the data, and then the model and parameters are checked for the appropriate mathematical properties (namely, being stationary, meaning parameters such as the mean and variance do not change over time). Simply put, ideally the process leaves us with a model that captures the structure in the time series, with a residual that is simply noise.

So, why use ARIMA models instead of ARMA ? ARIMA models are used instead of ARMA models when the time series exhibits non-stationarity (i.e. parameters such as the mean and variance change over time), and are thus more general than ARMA. ARIMA encompasses random-walk and random-trend models, autoregressive models, and exponential smoothing models, etc.

How does it do this? Let’s start with a semi-formal definition of ARIMA(p,d,q) model, where:

  • p is the number of autoregressive terms,
  • d is the number of nonseasonal differences, and
  • q is the number of lagged forecast errors in the prediction equation.

such that p, d, and q are integers greater than or equal to zero and refer to the order of the autoregressive, integrated, and moving average parts of the model, respectively. So p captures the order of an autoregressive model (a linear regression of the current value of the series against one or more prior values of the series); d is the order of the differencing used to make the time series stationary; and q is the order of the moving average model (a linear regression of the current value of the series against the white noise or random shocks of one or more prior values of the series). Permutations of integer values for all of these components define a large family of time series models to fit to data. It is with this family, and fairly simple addition of seasonal components, that many statistical packages allow us to model/fit time series with an ARIMA.

NOTE: To really understand and apply ARIMA correctly requires a mathematical understanding of everything stated above. See the links at bottom for a start on the mathematical definitions that substantiate the conceptual overview provided.

I hope that this is of some help.

Links:

Leave a comment

Your email address will not be published. Required fields are marked *