Can anyone give me guidance on performing a two-stage least-squares analysis using Excel, Minitab 15, or Statistica 9?
I’m taking an economics course and am required to perform this analysis, but only have experience in Excel. I’m not certain that Excel can handle this, so I downloaded trial versions of Minitab 15 and Statistica 9 but don’t have much experience with either program. Any guidance or instructions would certainly help. Thanks in advance…
My answer:
First, I will try to summarize what two-stage least squares regression (2SLS) is, for common understanding. 2SLS is a method for regression on models that violate one of the assumptions of standard, ordinary least squares (OLS) regression: recursivity. A path diagram (or equivalent structural equation/simultaneous equation/path equation) is said to be recursive when the causal arrows and error terms are non-looping. Basically, when there are loops in the model, an instrumental variable must be interposed to allow for the application of OLS. The steps are as follows: 1) new dependent or endogenous variables are created to substitute for the original ones, and 2) the regression is computed using OLS, but using the newly created variables, thus circumventing the resursivity constraint. For details on a concrete example, see the first link below.
By understanding the above in detail, 2SLS can be implemented in any modern-day statistical software (R, Statistica, SPSS), and even Excel with enough programming. So, what to do? Minitab does not appear to have a native 2SLS module, though you should be able to piece together a solution from its capabilities, but that is far from efficient. Apparently, Statistica 9 has an add-on module for handling 2SLS, but I have never used it. I suggest trying to get the 2SLS add-on for Statistica, if possible, or simply download EasyReg (second link below). EasyReg is free for non -commercial use, and has online instructions for ordinary least squares regression (last link below).
Again, I find myself drawn-in to a Linked-In question that forces me to clearly articulate my understanding:
Question: Could some one explain the ARIMA forecasting model in a simplified manner?
My Answer:
An autoregressive integrated moving average (ARIMA) models are best understood in the context of Box–Jenkins methodology, who’s steps are as follows:
Model identification and model selection.
Parameter estimation.
Model checking.
Box–Jenkins methodology applies autoregressive moving average (ARMA) or ARIMA models to find the best fit of a time series to past values of a time series, in order to make forecasts and understand the underlying process/es. Briefly put, the model selected is the simplest one that accounts for the properties of the time series, then the parameters for the specific model are chosen for best fit to the data, and then the model and parameters are checked for the appropriate mathematical properties (namely, being stationary, meaning parameters such as the mean and variance do not change over time). Simply put, ideally the process leaves us with a model that captures the structure in the time series, with a residual that is simply noise.
So, why use ARIMA models instead of ARMA ? ARIMA models are used instead of ARMA models when the time series exhibits non-stationarity (i.e. parameters such as the mean and variance change over time), and are thus more general than ARMA. ARIMA encompasses random-walk and random-trend models, autoregressive models, and exponential smoothing models, etc.
How does it do this? Let’s start with a semi-formal definition of ARIMA(p,d,q) model, where:
p is the number of autoregressive terms,
d is the number of nonseasonal differences, and
q is the number of lagged forecast errors in the prediction equation.
such that p, d, and q are integers greater than or equal to zero and refer to the order of the autoregressive, integrated, and moving average parts of the model, respectively. So p captures the order of an autoregressive model (a linear regression of the current value of the series against one or more prior values of the series); d is the order of the differencing used to make the time series stationary; and q is the order of the moving average model (a linear regression of the current value of the series against the white noise or random shocks of one or more prior values of the series). Permutations of integer values for all of these components define a large family of time series models to fit to data. It is with this family, and fairly simple addition of seasonal components, that many statistical packages allow us to model/fit time series with an ARIMA.
NOTE: To really understand and apply ARIMA correctly requires a mathematical understanding of everything stated above. See the links at bottom for a start on the mathematical definitions that substantiate the conceptual overview provided.