Survival Analysis in Stata: stset and sts – Kaplan-Meier
- Mayta
- 6 hours ago
- 2 min read
1. Declaring Survival Data with stset
Purpose:The stset command in Stata designates your dataset as survival (time-to-event) data. This step is essential before performing any survival or time-to-event analysis.
Core Syntax:
stset timevar, failure(eventvar)
timevar: The variable representing follow-up time (e.g., number of days until event or censoring).
failure(eventvar): Indicator variable for the event (1 = event occurred, 0 = censored).
Implanon Example:
stset day, failure(remove)
day: Time from Implanon insertion to removal or last follow-up (censoring).
remove: 1 if Implanon was removed, 0 if still in place at last follow-up.
What Stata Reports:
Number of records, number of events
Total person-time observed
Range of observed time
2. Key stset Options in Clinical Datasets
Option | Function | Example |
failure(var) | Identifies the event indicator | failure(remove) |
id(var) | Unique subject identifier (useful for repeated records) | id(woman_id) |
origin(time var) | Sets the starting point for time at risk | origin(time dob) |
enter(time var) | Delays entry into the risk set (late entry) | enter(time enrollment) |
scale(#) | Changes the unit of time (e.g., from days to years) | scale(365.25) |
Scenario:If you want to measure analysis time from date of birth (dob), but subjects only become at risk when they enroll (enrollment), use:
stset day, failure(remove) origin(time dob) enter(time enrollment)
This setup measures time from birth but only counts from when the participant actually joined the study.
3. Kaplan-Meier and Survival Analysis with sts
After using stset, you can perform nonparametric survival analyses and visualize results using the sts suite of commands.
Typical Tasks:
Plot the survivor function (Kaplan-Meier curve):
sts graph
Plot cumulative incidence (1 – survival):
sts graph, failure
Display life-table details:
sts list
Visualize survival by subgroups (e.g., living with husband):
sts graph, by(notlivingtogether)
Add confidence intervals:
sts graph, by(notlivingtogether) ci
Compare groups using the log-rank test:
sts test notlivingtogether
Example:
sts graph, by(notlivingtogether) ci
This command will display Kaplan-Meier curves for women living with vs not living with their husband, with confidence intervals.
4. Sample Analysis Workflow
// 1. Set up survival data structure
stset day, failure(remove)
// 2. Plot the hazard function (instantaneous event rate)
sts graph, hazard
// 3. Plot cumulative incidence (probability Implanon was removed)
sts graph, failure
// 4. List survival probability at specific times, by living arrangement
sts list, surv at(0 90 180 365) by(notlivingtogether)
5. Interpretation Tips
Kaplan-Meier curves: A sharper decline indicates a higher rate of the event.
sts list output: Look at the columns—number at risk, number of events, number lost (censored), survivor function, standard error, and confidence intervals.
Group comparisons: Use by() to examine differences in survival between groups (e.g., exposure vs control).
6. Handling Censoring and Multiple Records
If each subject can have multiple records (such as with time-dependent covariates), always use the id() option to uniquely identify subjects:
stset day, id(woman_id) failure(remove)
This tells Stata which rows belong to the same individual.
7. Essentials for Practice and Exams
Begin with stset to structure your data.
Always specify time, event, and subject ID (if applicable).
Plot and tabulate survival and failure functions with sts.
Use the log-rank test (sts test) to statistically compare survival between groups.
Incorporate origin(), enter(), or scale() for complex entry or time-scale scenarios.
Comments