Three Tools to Estimate Causal Effects Without a Trial: Model-Based, Standardisation, and Matching

Mayta
Jun 28
3 min read

Updated: Jun 28

Imagine you're a clinician-researcher trying to answer a deceptively simple question:

“Do apples help lower blood pressure?”

You can’t run an experiment, but you do have data from people who did or didn’t eat apples.

Problem: People who eat apples are different! Maybe they’re younger, healthier, or exercise more. That messes up the comparison.

To fix this, you try three methods to make the groups more “fair”:

1️⃣ Model-Based = Smart Calculator

🧮 What You Do: You type everyone’s info into a calculator:

Did they eat apples? ✅/❌
How old are they?
What’s their weight?

The calculator tries to “adjust” for age and weight to see if apples still help.

📦 Like:

“If a 50-year-old eats apples vs. a 50-year-old who doesn’t—who has better blood pressure?”

📉 Problem: The calculator assumes apples work the same for everyone.

If apples help young people more than older people, the calculator gets it wrong.

2️⃣ Standardisation = Group-by-Group Averaging

🧑‍🏫 What You Do: You split people into age groups:

Young (under 30)
Middle-aged (30–60)
Older (60+)

Then check:

In each group, do apple-eaters have better BP?

You then average the answers based on how many people are in each group.

🎯 Good for: If you believe apples help some age groups more than others.

📉 Problem: Doesn’t work well if age is a number (like 22.4, 43.1, etc.)—you’d have too many groups! And if in one group no one ate apples, you can’t compare.

🧮 Standardisation ≠ Subgroup Analysis

Standardisation หารน้ำหนักแล้วเอามาบวกรวมกันคืนเพราะแต่ละกลุ่มได้รับผลไม่เท่ากัน Subgroup Analysis แยกคิดแต่ละกลุ่มไปเลย 🍎 Standardisation = "Group-by-Group Adjust and Combine"

You:

Split people into strata (like age groups).
Calculate the effect of apples inside each group:
- E.g., “In the 30–60 age group, how much lower was BP for apple-eaters?”
Then, you re-weight the group results based on how common each group is in your total population.

💡 Purpose: Get the overall average effect of apples—adjusted for the fact that different age groups respond differently and may be unequally represented.

🔁 Think of it as:

“Let me adjust the results so they reflect what would happen if every group was represented fairly.”

✅ This is for adjustment, not exploration.

🔍 Subgroup Analysis = "Compare One Group to Another"

You:

Still split into groups (like age).
But now you ask a different question:
“Do apples help the young more than the old?”

You’re testing for effect modification—whether the effect of apples changes between groups.

💡 Purpose: You want to compare effects between groups.

🔁 Think of it as:

“Let me see if the apple effect is different across groups.”

⚖️ Summary Table

Concept	What it Does	Purpose	Output
Standardisation	Adjust for differences in confounder distribution	Fair overall effect	One combined effect
Subgroup Analysis	Test if effect varies across groups	Explore effect modifiers	One effect per group

🍏 Easy Analogy

You have 100 people from 3 countries:

Thailand (70 people)
Japan (20 people)
Italy (10 people)

Say apples help more in Italy, but your sample is mostly Thai people. If you just take the raw average, Italy's strong apple effect gets ignored.

🧮 Standardisation says:

“Let’s give equal weight to each country’s apple effect—even if they had fewer people in the study.”

🔍 Subgroup analysis says:

“Let’s compare: Are apples more helpful in Italy than in Thailand?”

3️⃣ Matching = Apple Twins

👯 What You Do: You find each apple-eater a “twin” who didn’t eat apples but is similar in:

Age
Weight
Smoking status

Then you compare their blood pressure.

🎯 Best when:You want to mimic a fair test, like a mini-randomized trial.

📉 Problem: You might not find twins for everyone. Also tricky if one person is 22.4 years old, smokes a little, and has high BMI—hard to match!

🧠 Easy Analogy Recap

Method	Metaphor	What it’s like
Model-Based	🧠 Calculator	You adjust numbers to compare “apples to apples” using a formula
Standardisation	📊 Group Average	You compare groups like “young people” and take an average
Matching	👯 Twin Finder	You make matched pairs and compare each duo

🧠 Why Your Professor Uses All Three

Your professor isn't being redundant—they're being rigorous:

Model-Based is efficient—but fragile if assumptions fail.
Standardisation honors heterogeneity—but breaks with too many strata.
Matching is robust—but sensitive to covariate overlap.

Using all three methods creates a triangulation strategy:

If they agree → high confidence in the effect.
If they diverge → investigate why (model misfit? poor matching? positivity issues?).

This is the foundation of modern causal inference in epidemiology.

🍏 TL;DR – Cheat Sheet

Method	Handles Continuous X?	Allows Effect Modification?	Pitfalls
Model-Based	✅ Yes	❌ No	Biased if effect varies by group
Standardisation	❌ No (categorical only)	✅ Yes	Breaks with too many groups
Matching	✅ Yes	✅ Yes	Hard to balance mixed covariates