Age and Cancer Risk: From Log-Odds to Probability in Logistic Regression (Image)

🚦Three Panels, One Story: Modeling Binary Outcomes

Imagine we're studying how the probability of developing cancer increases with age.You want to know: "How does age affect risk?"The actual data are: Age (years) + Cancer status (yes/no).
Panel 1: Log-Odds (log(odds of cancer))
- What you see: A perfectly straight, upward-sloping line.
- Why: Logistic regression models the relationship between age and the log-odds of developing cancer.
- Interpretation:
- For each year older, the log-odds of developing cancer increases by the same amount (the slope, β₁).
- This is what the logit command in Stata is fitting.
- But: Log-odds are hard for most people to interpret directly!
Panel 2: Odds of Cancer
- What you see: A curve that starts flat, then rises very fast.
- Why:
- Odds are the exponentiated value of log-odds (odds = exp(log-odds)).
- They always stay positive and can get very large.
- Interpretation:
- If odds = 1, chance is 50:50.
- Odds >1 means it’s more likely to happen than not; odds <1 means less likely.
- Clinically: Odds are easier than log-odds, but still unintuitive for common outcomes.
Panel 3: Probability of Cancer
- What you see: The classic S-shaped ("sigmoid") curve.
- Why:
- Logistic regression uses the log-odds line to calculate the probability:
- This maps any value of age onto a probability between 0 and 1.
- Interpretation:
- When age is low, probability is near zero.
- As age rises, probability increases rapidly in the middle years, then levels off as it approaches 1.
- This is what most clinicians/patients care about: “Given this age, what is the chance of developing cancer?”
Connecting the Panels:
- Logistic regression fits a straight line to log-odds (Panel 1).
- That line translates to a sharply rising curve for odds (Panel 2).
- Which then transforms to a smooth S-shaped probability curve (Panel 3).
Why this matters in clinical research:
- The true model is linear only in log-odds—this is why you get an S-shaped risk curve even if the log-odds are perfectly linear.
- It lets you make predictions for any age—even if nobody in your data was exactly 47.5 years old.
- You can use this for any binary outcome: disease/no disease, event/no event, mortality, admission, etc.
Summary: Logistic regression is the bridge between “linear world” (log-odds) and the “real world” (probabilities). You model in the first panel, but you interpret using the third.
Let me know if you want:
- A real-data example with two groups (e.g., smokers vs non-smokers)
- How to use these ideas in Stata code or a clinical paper
- Or a deeper dive into odds ratios or margins plots!
Comments
No comments yet. Be the first to share your thoughts.
Sign in to comment