top of page

Causal Thinking in Observational Studies: Matching, Propensity Scores, and IPTW Explained

When we want to know if a treatment truly causes a better outcome—especially in observational studies—we need more than just statistics. We need causal thinking, and we need the right methods. This guide walks you through model-based adjustment, standardisation, matching, balancing scores, propensity scores, and IPTW, all in one place, explained simply.

🔹 Model-Based Adjustment

What it is: You build a regression model to estimate the treatment effect while adjusting for covariates.

How it works:

  • “Let's model outcome = treatment + age + BP + diabetes...”

  • Assumes treatment effect is the same across all subgroups (no effect modification).

When to use: You trust your model, have mostly continuous variables, and assume no big effect variation between subgroups.

🔹 Standardisation

What it is: You split people into groups (e.g., age 60–70, 70–80), calculate treatment effects in each group, then average them.

Key point: It allows for effect modification (e.g., statins might help older patients more).

Limitations:

  • Only works well with categorical covariates.

  • Too many strata → small sample sizes → positivity violation.

🔹 Matching

What it is: For every treated person, you find one (or more) untreated person(s) who look very similar in covariates.

Benefit: Doesn’t assume a model. Instead, it mimics a randomized trial by design.

Challenge: Matching on many variables is hard—especially when they’re a mix of continuous and categorical.

🔹 Balancing Score

What it is: A score that summarizes a patient’s covariates. If two people have the same score, they’re “balanced.”

Example:

  • Patient A and B look very similar in age, LDL, and diabetes → same balancing score.

  • Makes them comparable for treatment effect estimation.

🔹 Propensity Score (p(X))

What it is: A special type of balancing score—it’s the probability of receiving treatment, given the person’s covariates.

Example:

  • A patient with p(X) = 0.85 → they had an 85% chance of getting statins based on their age, LDL, etc.

Use:

  • Match people with similar p(X)

  • Stratify by p(X)

  • Weight them using p(X) → leads to IPTW

🔹 Why We Use Bell Curve Plots for Propensity Score

After calculating propensity scores, we graph the distribution of p(X) in both treated and untreated groups. These often look like “bell curves.”

What we check:

  • Do the curves overlap a lot? Good! We can compare.

  • Do they barely touch? Bad. We can’t make fair comparisons.

We only analyze people in the region of common support—where treated and untreated groups have overlapping p(X). This improves fairness but may reduce sample size.

🔹 Checking Balance After PS

After matching or stratifying by PS, we must check covariate balance using something called standardized differences (stddiff).

Rule of thumb:

  • stddiff < 0.1 → balanced

  • stddiff ≥ 0.1 → still biased

This is essential before estimating any treatment effect.

🔹 IPTW (Inverse Probability of Treatment Weighting)

What it is: A technique that creates a “pseudo-population” where treatment is randomly assigned—by weighting each person based on their p(X).

Weights:

  • For treated → 1 / p(X)

  • For untreated → 1 / (1 - p(X))

Why it's powerful:

  • Uses the whole dataset (unlike matching which may discard cases).

  • Balances groups so you can compare outcomes as if they were randomized.

🔹 Final Workflow (Putting It All Together)

  1. Estimate propensity scores using covariates that influence both treatment and outcome.

  2. Check overlap using bell-curve plots (region of common support).

  3. Choose a method:

    • Match on p(X)

    • Stratify on p(X)

    • Weight using IPTW

  4. Check balance using standardized differences.

  5. Estimate causal effect using outcome models (with or without weights).

✅ Summary Table

Method

Core Idea

Best For

Model-Based

Regression + adjustment

Simple structure, no effect modification

Standardisation

Grouping + averaging

Allows effect modification

Matching

Pair similar individuals

Precise but sample may shrink

Propensity Score

Chance of treatment

Enables match/stratify/weight

IPTW

Weighting to mimic randomization

Full-sample causal estimation


Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page