Causal Thinking in Observational Studies: Matching, Propensity Scores, and IPTW Explained
- Mayta
- Jun 27
- 3 min read
When we want to know if a treatment truly causes a better outcome—especially in observational studies—we need more than just statistics. We need causal thinking, and we need the right methods. This guide walks you through model-based adjustment, standardisation, matching, balancing scores, propensity scores, and IPTW, all in one place, explained simply.
🔹 Model-Based Adjustment
What it is: You build a regression model to estimate the treatment effect while adjusting for covariates.
How it works:
“Let's model outcome = treatment + age + BP + diabetes...”
Assumes treatment effect is the same across all subgroups (no effect modification).
When to use: You trust your model, have mostly continuous variables, and assume no big effect variation between subgroups.
🔹 Standardisation
What it is: You split people into groups (e.g., age 60–70, 70–80), calculate treatment effects in each group, then average them.
Key point: It allows for effect modification (e.g., statins might help older patients more).
Limitations:
Only works well with categorical covariates.
Too many strata → small sample sizes → positivity violation.
🔹 Matching
What it is: For every treated person, you find one (or more) untreated person(s) who look very similar in covariates.
Benefit: Doesn’t assume a model. Instead, it mimics a randomized trial by design.
Challenge: Matching on many variables is hard—especially when they’re a mix of continuous and categorical.
🔹 Balancing Score
What it is: A score that summarizes a patient’s covariates. If two people have the same score, they’re “balanced.”
Example:
Patient A and B look very similar in age, LDL, and diabetes → same balancing score.
Makes them comparable for treatment effect estimation.
🔹 Propensity Score (p(X))
What it is: A special type of balancing score—it’s the probability of receiving treatment, given the person’s covariates.
Example:
A patient with p(X) = 0.85 → they had an 85% chance of getting statins based on their age, LDL, etc.
Use:
Match people with similar p(X)
Stratify by p(X)
Weight them using p(X) → leads to IPTW
🔹 Why We Use Bell Curve Plots for Propensity Score
After calculating propensity scores, we graph the distribution of p(X) in both treated and untreated groups. These often look like “bell curves.”
What we check:
Do the curves overlap a lot? Good! We can compare.
Do they barely touch? Bad. We can’t make fair comparisons.
We only analyze people in the region of common support—where treated and untreated groups have overlapping p(X). This improves fairness but may reduce sample size.
🔹 Checking Balance After PS
After matching or stratifying by PS, we must check covariate balance using something called standardized differences (stddiff).
Rule of thumb:
stddiff < 0.1 → balanced
stddiff ≥ 0.1 → still biased
This is essential before estimating any treatment effect.
🔹 IPTW (Inverse Probability of Treatment Weighting)
What it is: A technique that creates a “pseudo-population” where treatment is randomly assigned—by weighting each person based on their p(X).
Weights:
For treated → 1 / p(X)
For untreated → 1 / (1 - p(X))
Why it's powerful:
Uses the whole dataset (unlike matching which may discard cases).
Balances groups so you can compare outcomes as if they were randomized.
🔹 Final Workflow (Putting It All Together)
Estimate propensity scores using covariates that influence both treatment and outcome.
Check overlap using bell-curve plots (region of common support).
Choose a method:
Match on p(X)
Stratify on p(X)
Weight using IPTW
Check balance using standardized differences.
Estimate causal effect using outcome models (with or without weights).
✅ Summary Table
Method | Core Idea | Best For |
Model-Based | Regression + adjustment | Simple structure, no effect modification |
Standardisation | Grouping + averaging | Allows effect modification |
Matching | Pair similar individuals | Precise but sample may shrink |
Propensity Score | Chance of treatment | Enables match/stratify/weight |
IPTW | Weighting to mimic randomization | Full-sample causal estimation |
Comments