Choosing the Right Generalized Linear Models (GLMs) in Stata: A DEPTh-Based Guide
- Mayta
- Jun 11
- 3 min read
Updated: 6 days ago
✳️ Why This Matters
If you’ve typed or seen something like this in Stata:
stata: glm y x, fam(bin) link(log)
...and felt unsure what it really means—you’re not alone.
This tiny line holds powerful logic for clinical research. It tells Stata:
“Model the chance of an outcome (y) depending on exposure (x), assuming the outcome behaves like a binomial (yes/no) event, and relate them through a logarithmic scale.”
🧠 The Big Idea
The command structure is:
stata: glm <outcome> <explanatory variables>, fam(<distribution>) link(<scale>)
Each part has meaning:
Part | Stata Syntax Example | What It Says in Plain English |
glm | glm | Use a generalized linear model |
<y> | dead | The outcome variable (e.g., died or survived) |
<x> | treatment | The predictor/exposure (e.g., Drug A vs B) |
fam(bin) | fam(bin) | Outcome is binary (0/1) |
link(log) | link(log) | Use a logarithmic scale for modeling the risk |
🔍 The “Family”: What Is fam()?
The fam() option tells Stata what type of data your outcome variable is:
Family (fam) | Use for… | Clinical Examples |
binomial | Yes/No outcomes | Survived/Died, Cured/Not, HIV+/– |
gaussian | Continuous outcomes | BP, Weight, Lab values |
poisson | Count outcomes | ER visits, Infections, Seizures |
gamma | Skewed positive continuous | Hospital cost, Length of stay |
📌 Think: "What does my outcome variable look like?"
🔗 The “Link”: What Is link()?
The link() option tells Stata how to mathematically connect your predictor (x) to your outcome (y):
Link Function | What It Models | Use When You Want… |
logit | Log-odds | Odds Ratio (OR) |
log | Log-risk | Risk Ratio (RR), Incidence Ratio |
identity | Direct difference in risk | Risk Difference (RD), mean change |
📌 Think: "What do I want to report to clinicians or policymakers?"
🧪 Common Stata GLM Combos for Clinical Research
Research Goal | Use This GLM Syntag | Interprets Output As... |
Estimate Odds Ratio | glm y x, fam(bin) link(logit) | Odds ratio (good for case-control) |
Estimate Risk Ratio | glm y x, fam(bin, gaussian) link(log) | Risk ratio (cohort/RCTs) |
Estimate Risk Difference | glm y x, fam(bin, gaussian) link(identity) | Absolute % difference |
Compare Means | glm y x, fam(gaussian) link(identity) | Mean difference (like regression) |
Estimate IRR (rate ratio) | glm y x, fam(poisson) link(log) | Incidence rate ratio |
🧭 Mnemonic: "FAMILY is the nature of Y. LINK is how X affects Y."
Family = “What kind of variable is the outcome?” → Binary? Count? Continuous?
Link = “How do we relate exposure to the outcome?” → Ratio? Difference?
🔁 Combine them based on your study question, data structure, and clinical meaning.
🧠 Examples in Words (No Code!)
“I want to know if Drug A reduces mortality compared to Drug B in ICU patients.”
Outcome: Death (yes/no) → Binary → fam(bin)
Measure: Risk ratio preferred (not odds) → link(log)
Use: glm dead drug, fam(bin) link(log)
“How many ER visits do asthma patients have after new inhaler vs old one?”
Outcome: ER visit count → Count → fam(poisson)
Compare rates → link(log)
Use: glm visits inhaler, fam(poisson) link(log)
“Does the new diet change average HbA1c levels?”
Outcome: HbA1c (numeric) → Continuous → fam(gaussian)
Want mean difference → link(identity)
Use: glm a1c diet, fam(gaussian) link(identity)
✅ Key Takeaways for Clinicians
fam() = What kind of outcome? (binary, continuous, count, skewed)
link() = What measure do you want? (OR, RR, RD, IRR, mean diff)
Don’t default to odds ratio unless that’s your actual goal.
Use log link for intuitive risk ratios in cohort and RCTs.
Use identity link for absolute differences—great for policies.
🧪 Practice Challenge
Q: You run an RCT and want to estimate the risk ratio for infection in patients treated with Antibiotic A vs B. Infection is a yes/no outcome.
A: Your syntax is:
stata : glm infected drug, fam(bin) link(log)
This tells Stata:
Outcome = binary
Link = log (for risk ratio)
Comentários