Survival Analysis in Stata: stset and sts – Kaplan-Meier
1. Declaring Survival Data with stset
Purpose:The stset command in Stata designates your dataset as survival (time-to-event) data. This step is essential before performing any survival or time-to-event analysis.
Core Syntax:
stset timevar, failure(eventvar)
- timevar: The variable representing follow-up time (e.g., number of days until event or censoring).
- failure(eventvar): Indicator variable for the event (1 = event occurred, 0 = censored).
Implanon Example:
stset day, failure(remove)
- day: Time from Implanon insertion to removal or last follow-up (censoring).
- remove: 1 if Implanon was removed, 0 if still in place at last follow-up.
What Stata Reports:
- Number of records, number of events
- Total person-time observed
- Range of observed time
2. Key stset Options in Clinical Datasets
| Option | Function | Example |
| failure(var) | Identifies the event indicator | failure(remove) |
| id(var) | Unique subject identifier (useful for repeated records) | id(woman_id) |
| origin(time var) | Sets the starting point for time at risk | origin(time dob) |
| enter(time var) | Delays entry into the risk set (late entry) | enter(time enrollment) |
| scale(#) | Changes the unit of time (e.g., from days to years) | scale(365.25) |
Scenario:If you want to measure analysis time from date of birth (dob), but subjects only become at risk when they enroll (enrollment), use:
stset day, failure(remove) origin(time dob) enter(time enrollment)
This setup measures time from birth but only counts from when the participant actually joined the study.
3. Kaplan-Meier and Survival Analysis with sts
After using stset, you can perform nonparametric survival analyses and visualize results using the sts suite of commands.
Typical Tasks:
- Plot the survivor function (Kaplan-Meier curve):sts graph
- Plot cumulative incidence (1 – survival):sts graph, failure
- Display life-table details:sts list
- Visualize survival by subgroups (e.g., living with husband):sts graph, by(notlivingtogether)
- Add confidence intervals:sts graph, by(notlivingtogether) ci
- Compare groups using the log-rank test:sts test notlivingtogether
Example:
sts graph, by(notlivingtogether) ci
This command will display Kaplan-Meier curves for women living with vs not living with their husband, with confidence intervals.
4. Sample Analysis Workflow
// 1. Set up survival data structure
stset day, failure(remove)
// 2. Plot the hazard function (instantaneous event rate)
sts graph, hazard
// 3. Plot cumulative incidence (probability Implanon was removed)
sts graph, failure
// 4. List survival probability at specific times, by living arrangement
sts list, surv at(0 90 180 365) by(notlivingtogether)
5. Interpretation Tips
- Kaplan-Meier curves: A sharper decline indicates a higher rate of the event.
- sts list output: Look at the columns—number at risk, number of events, number lost (censored), survivor function, standard error, and confidence intervals.
- Group comparisons: Use by() to examine differences in survival between groups (e.g., exposure vs control).
6. Handling Censoring and Multiple Records
If each subject can have multiple records (such as with time-dependent covariates), always use the id() option to uniquely identify subjects:
stset day, id(woman_id) failure(remove)
This tells Stata which rows belong to the same individual.
7. Essentials for Practice and Exams
- Begin with stset to structure your data.
- Always specify time, event, and subject ID (if applicable).
- Plot and tabulate survival and failure functions with sts.
- Use the log-rank test (sts test) to statistically compare survival between groups.
- Incorporate origin(), enter(), or scale() for complex entry or time-scale scenarios.