Causal Inference in Observational Research: Strategies, Assumptions, and Stata Tools
- Mayta 
- Jun 19
- 4 min read
1 Why Observational Data Are Harder Than RCTs
| RCT privilege | What you lose in an observational study | 
| Investigator controls treatment → exchangeability holds by design | Treatment choice depends on prognosis, access, physician preference → systematic confounding | 
| Random allocation → everyone has a shot at either arm (positivity) | Some covariate patterns receive only one treatment (structural zeros) | 
| Protocol dictates a single treatment version (consistency) | Dose, timing, brand, adherence often vary across sites / time | 
| Individual assignment rarely affects others (no-interference/SUTVA) | Spill-over and social or herd effects can occur (e.g., vaccination) | 
Because none of these guarantees are automatic, every causal claim from observational data must prove or approximate the four identifiability assumptions:
- Exchangeability (no unmeasured confounding) 
- Positivity (overlap) 
- Consistency (well-defined exposure) 
- No-interference (one person’s treatment doesn’t change another’s outcome) 
Reality check : Exchangeability can never be proven with data alone—only argued through design (choosing confounders with a DAG) and diagnostics.
2 Core Strategy: “Make treated and untreated look interchangeable”
2·1 Model-Based Regression
| Feature | Pros | Cons / Assumptions | 
| Adds treatment and measured confounders in one model (linear, log-binomial, logistic, etc.) | Simple; familiar; can incorporate interactions | Requires correct link/linearity; gives conditional odds ratios (must marginalise for causal OR); sensitive to extrapolation outside covariate range | 
Marginalising a logistic model (e.g., Stata margins) converts conditional effects to the population-average scale.
2·2 Standardisation (G-Computation)
- Stratify on categorical confounders 
- Estimate treatment effect within each cell 
- Average over the covariate distribution 
Benefits: permits different effects in different strata (relaxes “no effect modification”)
Limits: works only with categorical confounders; many strata → sparse data → positivity issues.
2·3 Matching
| Idea | Typical tool | Diagnostic | 
| For every treated case, find one (or more) untreated with similar covariates → analyse matched pairs/sets | Mahalanobis distance, nearest-neighbour PS matching | Standardised Mean Difference (SMD) < 0.10 for every covariate | 
Strengths: intuitive “mini-trial” feel, automatically respects common support.Limits: can discard many observations; quality depends on measured covariates only.
| Item | Explanation | 
| Why “standardised”? | Dividing by the pooled SD removes the units of measurement, so you can compare imbalance across variables with different scales (e.g., age in years vs BMI in kg/m²). | 
| Intuitive scale | • 0.00 = perfect balance (groups identical on that covariate) • 0.10 ≈ groups differ by one-tenth of a pooled SD (small) • 0.20 ≈ one-fifth of an SD (medium) • 0.50 ≈ half an SD (large imbalance) | 
| Why the 0.10 rule-of-thumb? | Simulation and empirical studies show that an SMD below 0.10 (10% of an SD) typically translates to negligible bias in most treatment-effect estimates. It is strict enough to ensure balance but flexible enough to be achievable in real data. | 
| Application | After matching, weighting, or any propensity-score method, compute the SMD for every confounder. → If all SMDs < 0.10, you can reasonably claim the groups are “balanced” on the observed covariates. → If some SMDs are ≥ 0.10, refine the PS model (add interactions, non-linear terms) or tighten the matching caliper, then re-check. | 
2·4 Propensity-Score (PS) Methods
| Step | Goal / Comment | 
| 1. Estimate the propensity score (PS) | Model Pr(Treatment = 1 | X) with logistic regression or a machine-learning algorithm. | 
| 2. Check the Region of Common Support (RCS) | Plot the PS distribution by treatment group; drop or trim observations where the two groups do not overlap. | 
| 3. Apply the PS • Adjustment: include PS as a covariate in the outcome model. • Stratification: divide data into PS quintiles/deciles and compare within strata. • Matching: pair treated and untreated units with similar PS (e.g., 1:1, caliper). • Inverse Probability Weighting (IPW): weight each subject by 1/PS (treated) or 1/(1 − PS) (untreated). | All four techniques aim to balance the observed covariates; IPW usually retains more data than strict matching. | 
| 4. Re-check balance | Use Standardised Mean Differences (SMDs) or Love plots before vs after applying the chosen PS method. If imbalance persists, refine the PS model (add interactions, non-linear terms, etc.). | 
PS methods shine when you have many confounders and a moderate-to-large sample.
3 Method Comparison – Big Picture
| Method | Handles continuous X | Avoids positivity problems | Scales to many X | Outputs marginal effect directly* | Key modelling risk | 
| Regression / GLM | ✅ | ❌ (depends on extrapolation) | ⚠️ (over-fitting) | ✅ (for RD/RR) / ⚠️ (OR needs marginalising) | link & linearity | 
| Standardisation | ❌ (categorical only) | ❌ (many cells) | ❌ | ✅ | cell sparsity | 
| Matching | ✅ | ✅ (works inside RCS) | ⚠️ (drops data) | ✅ | poor matches | 
| Propensity Score | ✅ | ✅ (trim/weight) | ✅ | ✅ | PS model mis-spec. | 
*Risk difference / risk ratio are collapsible; odds ratio is not.
4 From Concept to Code – Stata Recipes
(Use only if you already grasp the logic above.)
| Goal | Core Stata commands | 
| Outcome model (continuous) | regress y i.treat X …, then margins, dydx(treat) | 
| Outcome model (binary, marginal OR) | logit death i.treat X … → margins treat, predict(pr) → nlcom … | 
| Parametric G-computation (standardisation) | glm y i.treat##i.cat1##i.cat2, link(logit) → margins, atmeans dydx(treat) | 
| Nearest-neighbour matching | teffects nnmatch (y X …) (treat), metric(maha) → pstest X … | 
| Propensity-score IPW | logit treat X … → predict pscore → visual overlap → teffects ipw (y) (treat pscore) | 
5 Diagnostic Checklist — Don’t Skip
- Draw a DAG to pick your confounders. 
- Check overlap / positivity visually. 
- Verify balance (SMD < 0.10) after weighting / matching. 
- Run a sensitivity analysis (e.g., E-value) for unmeasured confounding. 
- Report the causal scale (risk diff / ratio or marginal OR), not just model coefficients. 
6 Key Takeaways
Observational causal inference is all about designing your analysis to recreate the comparability that randomisation would have given you.
- Choose the simplest method that addresses your data’s weaknesses. 
- Always back claims of exchangeability with diagnostics and domain logic. 
- Balance first, estimate second, then stress-test your assumptions. 






Comments