Let's recall the causes of endogeneity in our model. First, simultaneous causality bias, Second, omitted variable bias, and third, the existence of covariates.

Let's consider a simple regression model.

`Y=X\beta+e---i`

**Existence of simultaneity**

If our model suffers from simultaneity, then the stuff on the RHS in one equation must show up in the LHS in the other equation(s), that is,

`Y=X\beta+e---i`

`X=Y\alpha+Z\Gamma---ii`

then,

Substituting `equation (i)` into `equation (ii)` yields

`X=(X\beta+e)\alpha+Z\Gamma`

`X=X\beta\alpha+Z\Gamma+e\alpha`

`X-X\beta\alpha = Z\Gamma+e\alpha`

`X=[I-\beta\alpha]^(-1)(Z\Gamma+e\alpha)---iii`

In `equation (iii)`, `X` is correlated with `e`. This means that `E[X,e]\ne0`. Hence, the fundamental OLS assumption is violated.

**Omitted variable bias**

The omitted variable bias occurs when we fail to include a relevant variable that is correlated with an independent variable (s) in our model. We considered the regression model as `Y=X\beta+e-i`. However, the true model is `Y=X\beta+Z\Gamma+e-iv`. Nevertheless, we estimate based on `equation (i)`.

We know,

`\beta_\text{OLS}=(X^TX)^(-1)(X^TY)`

The expectation of the OLS estimator is:

`E[\beta_\text{OLS}]=E[(X^TX)^(-1)(X^TY)]`

`E[\beta_\text{OLS}]=E[(X^TX)^(-1)[X^T(X\beta+Z\Gamma+e)]]`

`E[\beta_\text{OLS}]=E[[(X^TX)^(-1)(X^TX)\beta]+[(X^TX)^(-1)(X^T(Z\Gamma+e))]]`

`E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)(X^T(Z\Gamma+e))]`

`E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)((X^TZ\Gamma)+(X^Te))]`

`E[\beta_\text{OLS}]=E[\beta+[(X^TX)^(-1)(X^TZ\Gamma)]+[(X^TX)^(-1)(X^Te)]`

`E[\beta_\text{OLS}]=\beta+[(X^TX)^(-1)(X^TZ\Gamma)]+[(X^TX)^(-1)E(X^Te)]`

`\text{Since, } E(X^Te)=0`

`E[\beta_\text{OLS}]=\beta+[(X^TX)^(-1)(X^TZ\Gamma)]---v`

`Equation (v)` provides an interesting conclusion. `e`, the error term, is exogenous to `X`. The `\hat\beta_text{OLS}` is biased if `E[X^TZ]\ne0`. If `Z` is a random variable, then `E[X^TZ]=0`. Thus, only correlated missing variables are a problem. We should not worry about missing variables that are uncorrelated to `X`. If `X` and `Z` are correlated then the OLS estimator is comprised of two terms added together: (1) the true coefficient on X and (2) the marginal effect of `X` on `Z\Gamma`.

`\text{To be continued...}`

## 0 Comments