Instrumental variables approaches
Instrumental variables (IV) methods are frequently used for inferring causal associations from observational
data, particularly when the explanatory variable is known to be correlated with the error
term (i.e. ‘endogenous’). Standard regression techniques (ordinary least squares, logistic regression,
and so on) require that the error term be uncorrelated (or more technically ‘orthogonal’) to all the
independent variables for unbiased and consistent estimation of the parameters to occur. If this
condition is not met, then the estimated coefficients and their standard errors from these techniques
will be biased (the direction of which depends on the nature of the correlation with the error term).
Endogeneity results from common unaccounted for third factors that influence both participation
in the programme/policy (the key independent variable) and the error term. The advantage of IV
methods is that they generate statistically consistent estimates of beta coefficients, enabling the researcher
to answer the same question as regression methods (i.e. ‘what is the average effect of the
programme?’). IV techniques depend critically on the ability to identify an instrument that (1) is
highly correlated with the endogeneous explanatory variable, conditional on the other variables in
the model, and (2) is not correlated with the error term itself or directly influences the outcome
(dependent) variable being modelled (except through the explanatory variable being instrumented).
IV estimates are usually obtained through two-step modelling. In the first step, the endogenous
variable (e.g. programme participation) is estimated as a function of all the other variables and
the instrument(s) used for identification. In the second stage, the original regression of interest is
estimated, but instead of putting actual programme participation in the regression, the analyst replaces
that variable with the predicted value obtained from the first-stage regression and adjusts
the variance-covariance matrix appropriately. Occasionally, reduced form methods are used where
the instrument replaces the policy or programme variable in the second regression. This allows the
researcher to determine if a causal association exists, although they cannot then determine the true
magnitude of the association.