Omitted Variable Bias

The omitted variable bias occurs when we fail to include a relevant variable that is correlated with an independent variable (s) in our model. We considered the regression model as `Y=X\beta+e-i`. However, the true model is `Y=X\beta+Z\Gamma+e-iv`. Nevertheless, we estimate based on `equation (i)`.

We know,
`\beta_\text{OLS}=(X^TX)^(-1)(X^TY)`
The expectation of the OLS estimator is:
`E[\beta_\text{OLS}]=E[(X^TX)^(-1)(X^TY)]`
`E[\beta_\text{OLS}]=E[(X^TX)^(-1)[X^T(X\beta+Z\Gamma+e)]]`
`E[\beta_\text{OLS}]=E[[(X^TX)^(-1)(X^TX)\beta]+[(X^TX)^(-1)(X^T(Z\Gamma+e))]]`
`E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)(X^T(Z\Gamma+e))]`
`E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)((X^TZ\Gamma)+(X^Te))]`
`E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)(X^TZ\Gamma)]+[(X^TX)^(-1)(X^Te)]`
`E[\beta_\text{OLS}]=\beta+[(X^TX)^(-1)(X^TZ\Gamma)]+[(X^TX)^(-1)E(X^Te)]`
`\text{Since, } E(X^Te)=0`
`E[\beta_\text{OLS}]=\beta+[(X^TX)^(-1)(X^TZ\Gamma)]---v`
`Equation (v)` provides an interesting conclusion. `e`, the error term, is exogenous to `X`. The `\hat\beta_text{OLS}` is biased if `E[X^TZ]\ne0`. If `Z` is a random variable, then `E[X^TZ]=0`. Thus, only correlated missing variables are a problem. We should not worry about missing variables that are uncorrelated to `X`. If `X` and `Z` are correlated, then the OLS estimator is comprised of two terms added together: (1) the true coefficient on X and (2) the marginal effect of `X` on `Z\Gamma`.

Omitted Variable Bias Practice in Stata (Download).

Practical exercise in Python

Post a Comment

0 Comments