Omitted Variable Bias
Posted onOne important concept to discuss is that of omited variable bias. This occurs when you have endogenous predictors that you do not adequately control for in your analysis.
Fake Data Simulation
As with all analysis it is best to begin with a fake data simulation in order to build intuition about the problem. In this example suppose that we have some relationship that we would like to test where X predicts Y. Additionally, let’s suppose that there is some variable that affects by X and Y called Z.
<script type="application/json" data-for="htmlwidget-1ce56c343a97739c760b">{"x":{"diagram":"\ndigraph boxes_and_circles {\n\n # a \"graph\" statement\n graph [overlap = true, fontsize = 10]\n\n # several \"node\" statements\n node [shape = box,\n fontname = Helvetica]\n U; X; Y; Z;\n\n node [shape = circle,\n fixedsize = true,\n width = 0.9] // sets as circles\n U; X; Y; Z;\n\n # several \"edge\" statements\n U->X X->Y Z->Y Z->X\n}\n","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>
Create the Fake Data
n <- 1000
U <-
Z <-
X <-
Y <-
Let’s inspect out data and see what kind of relationship we would expect:
So if we were to do a naive linear regression of X on Y we would get the following results:
fit1 <-
arm::
lm(formula = Y ~ X)
coef.est coef.se
(Intercept) -1.17 0.14
X 1.36 0.02
---
n = 1000, k = 2
residual sd = 1.28, R-Squared = 0.78
It’s a pretty good fit, but let’s look at when we include our omitted variable.
fit2 <-
arm::
lm(formula = Y ~ X + Z)
coef.est coef.se
(Intercept) -0.05 0.12
X 1.00 0.02
Z 1.04 0.04
---
n = 1000, k = 3
residual sd = 0.99, R-Squared = 0.87
So here we see that when we include our omitted variable our R2 increases and our coefficient estimates change slightly, though the biggest change is a shrinking of our standard errors.
All this to say that it is good to inspect for omitted variable and more importantly to do the fake data simulations to see how sensitive your model is to them.
Citation
BibTex citation:
@online{dewitt2019
author = {Michael E. DeWitt},
title = {Omitted Variable Bias},
date = 2019-04-07,
url = {https://michaeldewittjr.com/articles/2019-04-07-omitted-variable-bias},
langid = {en}
}
For attribution, please cite this work as:
Michael E. DeWitt. 2019. "Omitted Variable Bias." April 7, 2019. https://michaeldewittjr.com/articles/2019-04-07-omitted-variable-bias