Let's recall the causes of endogeneity in our model. First, simultaneous causality bias, Second, omitted variable bias, and third, the existence of covariates.
Let's consider a simple regression model.
`Y=X\beta+e---i`
Existence of simultaneity
If our model suffers from simultaneity, then the stuff on the RHS in one equation must show up in the LHS in the other equation(s), that is,
`Y=X\beta+e---i`
`X=Y\alpha+Z\Gamma---ii`
then,
Substituting `equation (i)` into `equation (ii)` yields
`X=(X\beta+e)\alpha+Z\Gamma`
`X=X\beta\alpha+Z\Gamma+e\alpha`
`X-X\beta\alpha = Z\Gamma+e\alpha`
`X=[I-\beta\alpha]^(-1)(Z\Gamma+e\alpha)---iii`
In `equation (iii)`, `X` is correlated with `e`. This means that `E[X,e]\ne0`. Hence, the fundamental OLS assumption is violated.
Omitted variable bias
The omitted variable bias occurs when we fail to include a relevant variable that is correlated with an independent variable (s) in our model. We considered the regression model as `Y=X\beta+e-i`. However, the true model is `Y=X\beta+Z\Gamma+e-iv`. Nevertheless, we estimate based on `equation (i)`.
We know,
`\beta_\text{OLS}=(X^TX)^(-1)(X^TY)`
The expectation of the OLS estimator is:
`E[\beta_\text{OLS}]=E[(X^TX)^(-1)(X^TY)]`
`E[\beta_\text{OLS}]=E[(X^TX)^(-1)[X^T(X\beta+Z\Gamma+e)]]`
`E[\beta_\text{OLS}]=E[[(X^TX)^(-1)(X^TX)\beta]+[(X^TX)^(-1)(X^T(Z\Gamma+e))]]`
`E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)(X^T(Z\Gamma+e))]`
`E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)((X^TZ\Gamma)+(X^Te))]`
`E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)(X^TZ\Gamma)]+[(X^TX)^(-1)(X^Te)]`
`E[\beta_\text{OLS}]=\beta+[(X^TX)^(-1)(X^TZ\Gamma)]+[(X^TX)^(-1)E(X^Te)]`
`\text{Since, } E(X^Te)=0`
`E[\beta_\text{OLS}]=\beta+[(X^TX)^(-1)(X^TZ\Gamma)]---v`
`Equation (v)` provides an interesting conclusion. `e`, the error term, is exogenous to `X`. The `\hat\beta_text{OLS}` is biased if `E[X^TZ]\ne0`. If `Z` is a random variable, then `E[X^TZ]=0`. Thus, only correlated missing variables are a problem. We should not worry about missing variables that are uncorrelated to `X`. If `X` and `Z` are correlated then the OLS estimator is comprised of two terms added together: (1) the true coefficient on X and (2) the marginal effect of `X` on `Z\Gamma`.
`\text{To be continued...}`
0 تعليقات