Survival Analysis in Clinical Epidemiology Stata Code dominant: From Kaplan–Meier to Cox Regression (Non-parametric, Semi-parametric)
- Mayta
- 1 day ago
- 3 min read
Introduction
Survival analysis is used when time matters.
We are not only interested in whether an event happens, but also when it happens, and we must correctly handle censoring (patients who do not experience the event during follow-up).
1. Non-parametric Survival Analysis
(Describe and compare survival — no model assumptions)
Step 1: Survival-time setting (stset)
Before doing anything, we must tell Stata:
What is time
What is event
Who is censored
stset time, failure(event==1)
👉 This creates the survival data structure.
👉 Everything in survival analysis starts here.
Step 2: Life table analysis (sts list)
sts list
sts list, by(group)
What this does:
Divides time into intervals
Shows:
Number at risk
Number of events
Survival probability
✅ Good for:
Understanding risk sets
Teaching how censoring works
📌 Conceptually:
“How many patients are still under observation at each time?”
Step 3: Kaplan–Meier curves (sts graph)
Kaplan–Meier is a graphical form of sts list.
Survival curve
sts graph, survival
sts graph, survival by(group)
Shows:
Probability of remaining event-free over time
Failure curve (cumulative incidence)
sts graph, failure
sts graph, failure by(group)
Shows:
Proportion who have had the event by time t
📌 Important:
These curves are data-driven
They correctly handle censoring
No covariate adjustment
Step 4: Log-rank test (sts test)
sts test group
What it does:
Compares multiple Kaplan–Meier curves
Gives a p-value
Interpretation:
“Are the survival curves statistically different?”
✅ You can compare:
2 groups
3 groups
Many groups(one p-value overall)
❌ What it does NOT do:
Does not give effect size
Does not give hazard ratio
Summary: Non-parametric methods
Method | Purpose |
stset | Define time & event |
sts list | Life table (numbers at risk) |
sts graph | Kaplan–Meier curves |
sts test | Compare curves (p-value only) |
2. Semi-parametric Survival Analysis
Cox Proportional Hazards Regression (stcox)
Now we move from description to effect estimation.
What Cox regression tells us
stcox exposure
Cox regression answers:
“How much higher or lower is the risk over time?”
It reports a Hazard Ratio (HR).
Interpretation of Hazard Ratio
HR | Meaning |
HR = 1 | No difference |
HR > 1 | Higher risk |
HR < 1 | Lower (protective) risk |
Example interpretation:
“Cryosurgery reduced the hazard of recurrence by 52% (HR 0.48).”
Adjusted Cox regression (control confounders)
stcox exposure age sex i.other
This answers:
“What is the effect of exposure after adjusting for other factors?”
📌 Key strength of Cox:
Handles censoring
Uses time-to-event
Adjusts for covariates
Relationship to non-parametric methods
Method | Question answered |
Kaplan–Meier | What does survival look like? |
Log-rank | Are curves different? |
Cox | How large is the effect? |
3. Full Parametric Survival Models
(Know they exist — no detail needed)
Examples:
Exponential
Weibull
Log-normal
Log-logistic
In Stata:
streg exposure, dist(weibull)
Key idea:
Assumes a specific shape for the hazard
More assumptions than Cox
📌 For beginners:
Just know that parametric models exist You do not need details at this stage
4. Flexible Parametric Survival Models
(Know they exist — no detail needed)
Examples:
Royston–Parmar models
Splines for baseline hazard
Purpose:
More flexible hazard shapes
Used in advanced research
📌 For now:
“These are advanced extensions of survival models.”
No need to go deeper for introductory learning.
Big Picture Summary (Very Important)
Level | Method | What it does |
Non-parametric | sts | Describe & compare survival |
Semi-parametric | stcox | Estimate hazard ratios |
Parametric | streg | Model survival with assumptions |
Flexible parametric | Advanced | Complex hazard shapes |
One-sentence takeaway (exam-ready)
Kaplan–Meier describes survival, log-rank tests differences, Cox regression estimates effects, and parametric models add assumptions.




