Instrumental Variables
When an unmeasured confounder distorts the relationship between an exposure and an outcome, ordinary regression estimates the wrong thing. Instrumental variables (IV) exploit a special variable to recover the causal effect even when confounding cannot be measured or adjusted for.
The problem: confounding bias
Suppose we want the causal effect of an exposure on an outcome , but an unmeasured confounder influences both:
Because and the error term share , the regressor is correlated with the disturbance (). Ordinary least squares (OLS) is then biased and inconsistent: it estimates a mixture of the causal effect and the confounding association, not .
The instrument and its three assumptions
An instrument is a variable that lets us isolate the part of that is “as good as randomly assigned.” It must satisfy three assumptions:
- Relevance. is associated with the exposure: . This is testable.
- Independence (exogeneity). is independent of the confounders: . Not directly testable.
- Exclusion restriction. affects only through — there is no direct path . Not directly testable.
Intuitively, nudges without touching or by any other route, so the induced change in can be attributed to alone.
Estimators
The Wald ratio
With a single instrument, take covariances of the outcome equation with . Under independence and exclusion the confounder and direct terms drop out:
Solving for gives the Wald (ratio) estimator:
For a binary instrument this equals the difference in mean outcome divided by the difference in mean exposure across the two groups.
Two-stage least squares (2SLS)
With one or more instruments, the standard estimator is 2SLS:
- Stage 1. Regress on and keep the fitted values — the projection of onto the instrument, free of the confounded variation.
- Stage 2. Regress on . The coefficient on is .
With a single instrument, 2SLS is algebraically identical to the Wald ratio. In matrix form, with instrument matrix ,
Weak-instrument bias
If relevance is only barely satisfied ( near zero), the denominator is small and estimates become unstable, biased toward the OLS estimate, with poor confidence-interval coverage. A common rule of thumb is a first-stage -statistic above 10; weaker instruments demand caution.
Worked simulation
We generate data where OLS is badly confounded, then show that 2SLS recovers the true effect .
R
set.seed(1)
n <- 5000
U <- rnorm(n) # unmeasured confounder
Z <- rnorm(n) # instrument
X <- 0.8 * Z + 1.0 * U + rnorm(n) # exposure depends on Z and U
beta <- 2
Y <- beta * X + 2.0 * U + rnorm(n) # U confounds X and Y
# Naive OLS: biased upward (U inflates the X-Y association)
coef(lm(Y ~ X))["X"] # ~ 2.66
# Wald ratio / manual 2SLS
cov(Z, Y) / cov(Z, X) # ~ 2.00
# 2SLS via AER
# install.packages("AER")
library(AER)
coef(ivreg(Y ~ X | Z))["X"] # ~ 2.00
Python
import numpy as np
from linearmodels.iv import IV2SLS
import statsmodels.api as sm
rng = np.random.default_rng(1)
n = 5000
U = rng.normal(size=n)
Z = rng.normal(size=n)
X = 0.8 * Z + 1.0 * U + rng.normal(size=n)
beta = 2
Y = beta * X + 2.0 * U + rng.normal(size=n)
# Naive OLS: biased
sm.OLS(Y, sm.add_constant(X)).fit().params[1] # ~ 2.66
# Wald ratio / manual 2SLS
np.cov(Z, Y)[0, 1] / np.cov(Z, X)[0, 1] # ~ 2.00
# 2SLS via linearmodels: IV2SLS(dependent, exog, endog, instruments)
res = IV2SLS(Y, np.ones(n), X, Z).fit()
res.params["endog"] # ~ 2.00
Julia
using Random, Statistics
Random.seed!(1)
n = 5000
U = randn(n)
Z = randn(n)
X = 0.8 .* Z .+ 1.0 .* U .+ randn(n)
beta = 2
Y = beta .* X .+ 2.0 .* U .+ randn(n)
# Naive OLS slope: biased (~2.66)
Xo = hcat(ones(n), X)
(Xo \ Y)[2]
# Wald ratio (~2.00)
cov(Z, Y) / cov(Z, X)
# Manual 2SLS via least squares (\)
Zm = hcat(ones(n), Z)
Xhat = Zm * (Zm \ X) # stage 1 fitted exposure
Xh = hcat(ones(n), Xhat)
(Xh \ Y)[2] # stage 2 slope ~ 2.00
Why it matters for statistics
Instrumental variables extend causal estimation beyond the reach of adjustment: they identify effects when the confounders are unknown or unmeasured, which is the usual predicament in observational epidemiology and economics. Understanding the relevance, independence, and exclusion assumptions — and their untestability — is central to judging when an IV analysis is credible, and it underpies related designs such as Mendelian randomization and natural experiments.