Age and Cancer Risk: From Log-Odds to Probability in Logistic Regression (Image)
- Mayta
- 17 hours ago
- 2 min read
🚦Three Panels, One Story: Modeling Binary Outcomes

Imagine we're studying how the probability of developing cancer increases with age.You want to know: "How does age affect risk?"The actual data are: Age (years) + Cancer status (yes/no).
Panel 1: Log-Odds (log(odds of cancer))
What you see: A perfectly straight, upward-sloping line.
Why: Logistic regression models the relationship between age and the log-odds of developing cancer.
Interpretation:
For each year older, the log-odds of developing cancer increases by the same amount (the slope, β₁).
This is what the logit command in Stata is fitting.
But: Log-odds are hard for most people to interpret directly!
Panel 2: Odds of Cancer
What you see: A curve that starts flat, then rises very fast.
Why:
Odds are the exponentiated value of log-odds (odds = exp(log-odds)).
They always stay positive and can get very large.
Interpretation:
If odds = 1, chance is 50:50.
Odds >1 means it’s more likely to happen than not; odds <1 means less likely.
Clinically: Odds are easier than log-odds, but still unintuitive for common outcomes.
Panel 3: Probability of Cancer
What you see: The classic S-shaped ("sigmoid") curve.
Why:
Logistic regression uses the log-odds line to calculate the probability:
This maps any value of age onto a probability between 0 and 1.
Interpretation:
When age is low, probability is near zero.
As age rises, probability increases rapidly in the middle years, then levels off as it approaches 1.
This is what most clinicians/patients care about: “Given this age, what is the chance of developing cancer?”
Connecting the Panels:
Logistic regression fits a straight line to log-odds (Panel 1).
That line translates to a sharply rising curve for odds (Panel 2).
Which then transforms to a smooth S-shaped probability curve (Panel 3).
Why this matters in clinical research:
The true model is linear only in log-odds—this is why you get an S-shaped risk curve even if the log-odds are perfectly linear.
It lets you make predictions for any age—even if nobody in your data was exactly 47.5 years old.
You can use this for any binary outcome: disease/no disease, event/no event, mortality, admission, etc.
Summary: Logistic regression is the bridge between “linear world” (log-odds) and the “real world” (probabilities). You model in the first panel, but you interpret using the third.
Let me know if you want:
A real-data example with two groups (e.g., smokers vs non-smokers)
How to use these ideas in Stata code or a clinical paper
Or a deeper dive into odds ratios or margins plots!
Comments