Endogeneity and Use of Instrumental Variable

What is an endogeneity issue?

The endogeneity issue arises when one of the explanatory variables is correlated with the error term. Such an issue arises when some unobserved effect, which is difficult to measure, affects the explanatory variables.

For example

Marks = a + b*Class_attendance+error

Here, the estimate 'b' coefficient is biased as Class attendance is influenced by observed factors (such as distance to school) and unobserved factors (such as motivation to read). Thus, these omitted variables are captured in the error term, which results in the correlation between Class_attendance and Error term. This problem is called an Endogeneity issue caused due to omitted variable bias. So, using OLS in this context gives biased estimates.

Hence, we shall use other techniques such as Two-stage Least Squares, the Generalized Method of Moments, and so on.

The endogeneity issue is dangerous in comparison to other issues such as multicollinearity, heteroskedasticity, and autocorrelation.

How does the use of an instrumental variable (IV) remove endogeneity?

Let's start with an example.


Here, school attendance is endogenous as it is affected by factors such as motivation to read, which is very abstract. Hence, the estimated b coefficient is biased. So, our aim is to break the link between school attendance and motivation. To do so, we introduce an instrumental variable, which is random in nature.

We do two-stage OLS.

First, we regress School attendance with our IV and we regress the estimated value of school attendance to marks. In this way, the estimated value of school attendance is free of any impact of unobserved variables such as motivation.

Now, let's take rainfall as IV. Is it a good IV? Yes, it is because students may not go to school when it rainfalls, so rainfall influences school attendance (satisfies relevance criteria) and second, the most important, rainfall is very random (satisfies exclusion restriction).

That is,

Stage 1

School_attendance = c + d*rainfall +e

We get estimated_school_attendance from Stage 1

Stage 2

Mark = a+b*estimated_school_attendance+v

Now, the estimated b coefficient is unbiased as we have broken the link between school attendance and motivation by introducing rainfall as IV.

Post a Comment