Age and Cancer Risk: From Log-Odds to Probability in Logistic Regression (Image)

Mayta
Jun 25
2 min read

🚦Three Panels, One Story: Modeling Binary Outcomes

Imagine we're studying how the probability of developing cancer increases with age.You want to know: "How does age affect risk?"The actual data are: Age (years) + Cancer status (yes/no).

Panel 1: Log-Odds (log(odds of cancer))

What you see: A perfectly straight, upward-sloping line.
Why: Logistic regression models the relationship between age and the log-odds of developing cancer.
Interpretation:
- For each year older, the log-odds of developing cancer increases by the same amount (the slope, β₁).
- This is what the logit command in Stata is fitting.
But: Log-odds are hard for most people to interpret directly!

Panel 2: Odds of Cancer

What you see: A curve that starts flat, then rises very fast.
Why:
- Odds are the exponentiated value of log-odds (odds = exp(log-odds)).
- They always stay positive and can get very large.
Interpretation:
- If odds = 1, chance is 50:50.
- Odds >1 means it’s more likely to happen than not; odds <1 means less likely.
Clinically: Odds are easier than log-odds, but still unintuitive for common outcomes.

Panel 3: Probability of Cancer

What you see: The classic S-shaped ("sigmoid") curve.
Why:
- Logistic regression uses the log-odds line to calculate the probability:

This maps any value of age onto a probability between 0 and 1.

Interpretation:
- When age is low, probability is near zero.
- As age rises, probability increases rapidly in the middle years, then levels off as it approaches 1.
This is what most clinicians/patients care about: “Given this age, what is the chance of developing cancer?”

Connecting the Panels:

Logistic regression fits a straight line to log-odds (Panel 1).
That line translates to a sharply rising curve for odds (Panel 2).
Which then transforms to a smooth S-shaped probability curve (Panel 3).

Why this matters in clinical research:

The true model is linear only in log-odds—this is why you get an S-shaped risk curve even if the log-odds are perfectly linear.
It lets you make predictions for any age—even if nobody in your data was exactly 47.5 years old.
You can use this for any binary outcome: disease/no disease, event/no event, mortality, admission, etc.

Summary: Logistic regression is the bridge between “linear world” (log-odds) and the “real world” (probabilities). You model in the first panel, but you interpret using the third.

Let me know if you want:

A real-data example with two groups (e.g., smokers vs non-smokers)
How to use these ideas in Stata code or a clinical paper
Or a deeper dive into odds ratios or margins plots!