Synthetic Panels: A Practical Solution for Analyzing Transition Dynamics Using Repeated Cross-Sections in Python

Introduction

In recent years, the synthetic panel method has gained increasing popularity for analyzing transition dynamics of key economic indicators such as poverty, financial inclusion, and labor market outcomes. When panel data are available, tracking such transitions is relatively straightforward. However, in many developing country contexts, the absence of panel data makes this task considerably more challenging.

To address this gap, Dang and Lanjouw (2023) proposed a novel methodology that constructs synthetic panels using repeated cross-sectional survey data. The core idea is that although households sampled in different survey rounds are not the same, they often share common time-invariant characteristics—such as gender, ethnicity, religion, place of birth, and parental education. These stable traits, which are typically available in standard household surveys, can be used to statistically match households across time and approximate longitudinal data. By leveraging these characteristics, the synthetic panel method enables researchers to estimate household-level transitions in welfare and other outcomes without relying on actual panel data. Dang, Lanjouw, Luoto and McKenzie (2014) implement this methodology to estimate the cross-country transition dynamics.

Methodology

Let `y_{ij}` denote the (log) consumption or income of household `i` in survey round `j = 1, 2`. Let `x_i` be a vector of time-invariant characteristics observed in both rounds. For each round, the following regression model is estimated:

$$ y_{ij} = x_i^\top \beta_j + \varepsilon_{ij}, \quad j = 1, 2 $$

Importantly, consumption in round 1 is projected onto round 2 using the OLS estimates from round 1. That is, using `\hat{\beta}_1`, the fitted values `\hat{y}_{i1} = x_i^\top \hat{\beta}_1` serve as the predicted baseline welfare for each household observed in round 2, assuming identical `x_i`.

To estimate transition probabilities—such as the probability that a household moves out of poverty—we assume the errors `(\varepsilon_{i1}, \varepsilon_{i2})` follow a bivariate normal distribution with:

means zero: `\mathbb{E}[\varepsilon_{i1}] = \mathbb{E}[\varepsilon_{i2}] = 0`
standard deviations `\sigma_{\varepsilon_1}`, `\sigma_{\varepsilon_2}`
partial correlation `\rho`

Joint Transition Probabilities

Let `y_{i1}` and `y_{i2}` denote (log) consumption in round 1 and round 2, respectively. The poverty lines are `z_1` and `z_2`. The fitted values from the regressions are `\hat{y}_{i1} = x_i^\top \hat{\beta}_1` and `\hat{y}_{i2} = x_i^\top \hat{\beta}_2`. Define standardized thresholds:

$$ \tilde{z}_1 = \frac{z_1 - x_i^\top \hat{\beta}_1}{\sigma_{\varepsilon_1}}, \quad \tilde{z}_2 = \frac{z_2 - x_i^\top \hat{\beta}_2}{\sigma_{\varepsilon_2}} $$

Let `\rho` be the partial correlation of residuals. Then the joint transition probabilities are:

Stayed Poor (P11):
$$P_{11} = P(y_{i1} \leq z_1,\ y_{i2} \leq z_2) = \Phi_2(\tilde{z}_1, \tilde{z}_2;\ \rho)$$
Escaped Poverty (P10):
$$P_{10} = P(y_{i1} \leq z_1,\ y_{i2} > z_2) = \Phi(\tilde{z}_1) - \Phi_2(\tilde{z}_1, \tilde{z}_2;\ \rho)$$
Fell into Poverty (P01):
$$P_{01} = P(y_{i1} > z_1,\ y_{i2} \leq z_2) = \Phi(\tilde{z}_2) - \Phi_2(\tilde{z}_1, \tilde{z}_2;\ \rho)$$
Stayed Non-poor (P00):
$$P_{00} = P(y_{i1} > z_1,\ y_{i2} > z_2) = 1 - \Phi(\tilde{z}_1) - \Phi(\tilde{z}_2) + \Phi_2(\tilde{z}_1, \tilde{z}_2;\ \rho)$$

Conditional Transition Probabilities

From the joint probabilities, we define the conditional transitions:

Escape from Poverty (conditional on initial poverty):
$$P(y_{i2} > z_2 \mid y_{i1} \leq z_1) = \frac{P_{10}}{P_{10} + P_{11}} = \frac{\Phi(\tilde{z}_1) - \Phi_2(\tilde{z}_1, \tilde{z}_2;\ \rho)}{\Phi(\tilde{z}_1)}$$
Chronic Poverty (conditional):
$$P(y_{i2} \leq z_2 \mid y_{i1} \leq z_1) = \frac{P_{11}}{P_{10} + P_{11}} = \frac{\Phi_2(\tilde{z}_1, \tilde{z}_2;\ \rho)}{\Phi(\tilde{z}_1)}$$
Downward Mobility (conditional on initial non-poverty):
$$P(y_{i2} \leq z_2 \mid y_{i1} > z_1) = \frac{P_{01}}{P_{01} + P_{00}} = \frac{\Phi(\tilde{z}_2) - \Phi_2(\tilde{z}_1, \tilde{z}_2;\ \rho)}{1 - \Phi(\tilde{z}_1)}$$
Remain Non-poor (conditional):
$$P(y_{i2} > z_2 \mid y_{i1} > z_1) = \frac{P_{00}}{P_{01} + P_{00}} = \frac{1 - \Phi(\tilde{z}_1) - \Phi(\tilde{z}_2) + \Phi_2(\tilde{z}_1, \tilde{z}_2;\ \rho)}{1 - \Phi(\tilde{z}_1)}$$

The example demonstrates poverty dynamics, but it can be seamlessly applied to understand the dynamics of wide array of economic conditions, such as financial inclusion, health outcomes, and education outcomes.

Implementation in python

Building on the methodology proposed by Dang and Lanjouw (2023), I have developed a dedicated Python package that streamlines the estimation of poverty transition probabilities at the household level using synthetic panel techniques. The package automates key steps in the workflow, including OLS regression on time-invariant characteristics, projection of baseline welfare, estimation of partial correlation, and computation of joint and conditional transition probabilities.

To ensure ease of use, the package includes comprehensive documentation and a fully worked-out example. The implementation is available on my GitHub repository, and an interactive Jupyter notebook is embedded for hands-on experimentation and reproducibility.