Causal Inference in Observational Research: Strategies, Assumptions, and Stata Tools
- Mayta
- Jun 19
- 4 min read
1 Why Observational Data Are Harder Than RCTs
RCT privilege | What you lose in an observational study |
Investigator controls treatment → exchangeability holds by design | Treatment choice depends on prognosis, access, physician preference → systematic confounding |
Random allocation → everyone has a shot at either arm (positivity) | Some covariate patterns receive only one treatment (structural zeros) |
Protocol dictates a single treatment version (consistency) | Dose, timing, brand, adherence often vary across sites / time |
Individual assignment rarely affects others (no-interference/SUTVA) | Spill-over and social or herd effects can occur (e.g., vaccination) |
Because none of these guarantees are automatic, every causal claim from observational data must prove or approximate the four identifiability assumptions:
Exchangeability (no unmeasured confounding)
Positivity (overlap)
Consistency (well-defined exposure)
No-interference (one person’s treatment doesn’t change another’s outcome)
Reality check : Exchangeability can never be proven with data alone—only argued through design (choosing confounders with a DAG) and diagnostics.
2 Core Strategy: “Make treated and untreated look interchangeable”
2·1 Model-Based Regression
Feature | Pros | Cons / Assumptions |
Adds treatment and measured confounders in one model (linear, log-binomial, logistic, etc.) | Simple; familiar; can incorporate interactions | Requires correct link/linearity; gives conditional odds ratios (must marginalise for causal OR); sensitive to extrapolation outside covariate range |
Marginalising a logistic model (e.g., Stata margins) converts conditional effects to the population-average scale.
2·2 Standardisation (G-Computation)
Stratify on categorical confounders
Estimate treatment effect within each cell
Average over the covariate distribution
Benefits: permits different effects in different strata (relaxes “no effect modification”)
Limits: works only with categorical confounders; many strata → sparse data → positivity issues.
2·3 Matching
Idea | Typical tool | Diagnostic |
For every treated case, find one (or more) untreated with similar covariates → analyse matched pairs/sets | Mahalanobis distance, nearest-neighbour PS matching | Standardised Mean Difference (SMD) < 0.10 for every covariate |
Strengths: intuitive “mini-trial” feel, automatically respects common support.Limits: can discard many observations; quality depends on measured covariates only.
Item | Explanation |
Why “standardised”? | Dividing by the pooled SD removes the units of measurement, so you can compare imbalance across variables with different scales (e.g., age in years vs BMI in kg/m²). |
Intuitive scale | • 0.00 = perfect balance (groups identical on that covariate) • 0.10 ≈ groups differ by one-tenth of a pooled SD (small) • 0.20 ≈ one-fifth of an SD (medium) • 0.50 ≈ half an SD (large imbalance) |
Why the 0.10 rule-of-thumb? | Simulation and empirical studies show that an SMD below 0.10 (10% of an SD) typically translates to negligible bias in most treatment-effect estimates. It is strict enough to ensure balance but flexible enough to be achievable in real data. |
Application | After matching, weighting, or any propensity-score method, compute the SMD for every confounder. → If all SMDs < 0.10, you can reasonably claim the groups are “balanced” on the observed covariates. → If some SMDs are ≥ 0.10, refine the PS model (add interactions, non-linear terms) or tighten the matching caliper, then re-check. |
2·4 Propensity-Score (PS) Methods
Step | Goal / Comment |
1. Estimate the propensity score (PS) | Model Pr(Treatment = 1 | X) with logistic regression or a machine-learning algorithm. |
2. Check the Region of Common Support (RCS) | Plot the PS distribution by treatment group; drop or trim observations where the two groups do not overlap. |
3. Apply the PS • Adjustment: include PS as a covariate in the outcome model. • Stratification: divide data into PS quintiles/deciles and compare within strata. • Matching: pair treated and untreated units with similar PS (e.g., 1:1, caliper). • Inverse Probability Weighting (IPW): weight each subject by 1/PS (treated) or 1/(1 − PS) (untreated). | All four techniques aim to balance the observed covariates; IPW usually retains more data than strict matching. |
4. Re-check balance | Use Standardised Mean Differences (SMDs) or Love plots before vs after applying the chosen PS method. If imbalance persists, refine the PS model (add interactions, non-linear terms, etc.). |
PS methods shine when you have many confounders and a moderate-to-large sample.
3 Method Comparison – Big Picture
Method | Handles continuous X | Avoids positivity problems | Scales to many X | Outputs marginal effect directly* | Key modelling risk |
Regression / GLM | ✅ | ❌ (depends on extrapolation) | ⚠️ (over-fitting) | ✅ (for RD/RR) / ⚠️ (OR needs marginalising) | link & linearity |
Standardisation | ❌ (categorical only) | ❌ (many cells) | ❌ | ✅ | cell sparsity |
Matching | ✅ | ✅ (works inside RCS) | ⚠️ (drops data) | ✅ | poor matches |
Propensity Score | ✅ | ✅ (trim/weight) | ✅ | ✅ | PS model mis-spec. |
*Risk difference / risk ratio are collapsible; odds ratio is not.
4 From Concept to Code – Stata Recipes
(Use only if you already grasp the logic above.)
Goal | Core Stata commands |
Outcome model (continuous) | regress y i.treat X …, then margins, dydx(treat) |
Outcome model (binary, marginal OR) | logit death i.treat X … → margins treat, predict(pr) → nlcom … |
Parametric G-computation (standardisation) | glm y i.treat##i.cat1##i.cat2, link(logit) → margins, atmeans dydx(treat) |
Nearest-neighbour matching | teffects nnmatch (y X …) (treat), metric(maha) → pstest X … |
Propensity-score IPW | logit treat X … → predict pscore → visual overlap → teffects ipw (y) (treat pscore) |
5 Diagnostic Checklist — Don’t Skip
Draw a DAG to pick your confounders.
Check overlap / positivity visually.
Verify balance (SMD < 0.10) after weighting / matching.
Run a sensitivity analysis (e.g., E-value) for unmeasured confounding.
Report the causal scale (risk diff / ratio or marginal OR), not just model coefficients.
6 Key Takeaways
Observational causal inference is all about designing your analysis to recreate the comparability that randomisation would have given you.
Choose the simplest method that addresses your data’s weaknesses.
Always back claims of exchangeability with diagnostics and domain logic.
Balance first, estimate second, then stress-test your assumptions.
Comments