From Chaos to Causal Clarity: Balancing Scores and Propensity Scores Explained Simply
- Mayta
- 4 hours ago
- 3 min read
Introduction: Why This Matters
Imagine you're trying to compare two treatments, but your patients weren't randomized. You're worried that the two groups are different in important ways—older vs. younger, sicker vs. healthier. You want to know: Is it fair to compare them?
This is where balancing scores, especially the propensity score, come in. These are powerful tools that let you clean up observational data and mimic the logic of randomization.
Let’s walk through it—step by step—with simple analogies and clinical examples.
🧠 Step 1: Balancing Score – “How Similar Are They?”
💡 Concept:
A balancing score is a number that summarizes all the background traits (covariates) of a patient. If two patients have the same score, they are similar, regardless of whether one got treated and the other didn’t.
D⊥X∣b(X)
It lets you say: "I don't need to look at 10+ variables—just this score."
🏥 Clinical Example:
Patient A: Age 68, LDL 180, diabetes → looks like a typical statin patient.
Patient B: Age 67, LDL 185, diabetes → also looks like a statin patient.
Even if only one got a statin, if their balancing score is similar → fair to compare.
🍪 Everyday Analogy:
You're comparing people who eat cookies vs. those who don’t.
Cookie eaters = younger, less stressed, and more sleep
Balancing score = “Cookie-likeness”
If two people have the same balancing score, they have similar background traits—whether or not they eat cookies.
🎯 Step 2: Propensity Score – “How Likely Is Treatment?”
💡 Concept:
The propensity score is the probability that a patient receives a treatment, given their covariates.
p(X)=Pr(D=1∣X)
It’s the best, most efficient balancing score. Once two patients have the same p(X), they're exchangeable—even if only one got treated.
🏥 Clinical Example:
Patient A: p(X) = 0.88 → likely to get statin
Patient B: p(X) = 0.86 → likely too, but didn’t
These two are good for comparison. Their chance of being treated were the same.
🍪 Everyday Analogy:
Propensity Score = “How likely are you to be a cookie eater?”
Based on traits (age, stress, sleep)
If p(X) = 0.9 → you probably eat cookies
Match you to someone with p(X) = 0.9 who doesn’t → fair test of cookie impact
🪟 Step 3: Region of Common Support – “Who Can We Fairly Compare?”
💡 Concept:
Not everyone has a match. Some patients are very likely to get treatment, others are very unlikely. These outliers can’t be fairly compared.
So, we trim the extremes and keep only the overlapping zone, called the Region of Common Support (RCS).
🏥 Clinical Example:
Some patients: p(X) = 0.99 → always get statin
Others: p(X) = 0.01 → never get it
No overlap = can’t compare
Trim to those with p(X) = 0.2–0.9 where groups overlap → valid causal inference.
🍪 Everyday Analogy:
If all cookie eaters have p(X) = 0.9–1.0 and all non-cookie eaters have p(X) = 0–0.1 → no overlap.
You can’t compare someone who always eats cookies to someone who never would. Trim the sample to those in the middle.
📏 Step 4: Balance Diagnostics – “Did Matching Work?”
💡 Concept:
After matching or stratifying by propensity score, we need to check balance: are the groups truly similar?
We use standardized differences (stddiff). It compares each covariate between groups, adjusted for scale.
Good balance: stddiff < 0.1
Poor balance: stddiff > 0.1 → re-model or refine PS
🏥 Clinical Example:
You're testing:
xi: pbalchk age LDL diabetes ..., strata(PS_strata) if comsup==1, graph
The graph shows whether covariates are now balanced, stratum by stratum.
🍪 Everyday Analogy:
You’ve grouped cookie and non-cookie people by cookie-likeness (p(X)).
Now check: Do they still differ in sleep, age, stress?
If yes → something’s off. If no → ready to compare happiness fairly.
🔬 Recap with Realistic Numbers
Step | Example | Insight |
Balancing Score | A & B look similar | Use one score instead of 10 vars |
Propensity Score (p(X)) | p = 0.88 vs. 0.86 | Close enough to match |
Region of Common Support | p = 0.2–0.9 range kept | Outliers dropped, fair zone only |
Stddiff < 0.1 | LDL: 0.07, Age: 0.03 | Good—matching worked |
Comments