← All posts

Survival Analysis in Stata: stset and sts – Kaplan-Meier

Clinical Epidemiology ResearchUniqcret doctor knowledgesStata [Data Analytics]Data Analytics or Statistics

1. Declaring Survival Data with stset

Purpose:The stset command in Stata designates your dataset as survival (time-to-event) data. This step is essential before performing any survival or time-to-event analysis.

Core Syntax:

stset timevar, failure(eventvar)

Implanon Example:

stset day, failure(remove)

What Stata Reports:

2. Key stset Options in Clinical Datasets

OptionFunctionExample
failure(var)Identifies the event indicatorfailure(remove)
id(var)Unique subject identifier (useful for repeated records)id(woman_id)
origin(time var)Sets the starting point for time at riskorigin(time dob)
enter(time var)Delays entry into the risk set (late entry)enter(time enrollment)
scale(#)Changes the unit of time (e.g., from days to years)scale(365.25)

Scenario:If you want to measure analysis time from date of birth (dob), but subjects only become at risk when they enroll (enrollment), use:

stset day, failure(remove) origin(time dob) enter(time enrollment)

This setup measures time from birth but only counts from when the participant actually joined the study.

3. Kaplan-Meier and Survival Analysis with sts

After using stset, you can perform nonparametric survival analyses and visualize results using the sts suite of commands.

Typical Tasks:

Example:

sts graph, by(notlivingtogether) ci

This command will display Kaplan-Meier curves for women living with vs not living with their husband, with confidence intervals.

4. Sample Analysis Workflow

// 1. Set up survival data structure
stset day, failure(remove)

// 2. Plot the hazard function (instantaneous event rate)
sts graph, hazard

// 3. Plot cumulative incidence (probability Implanon was removed)
sts graph, failure

// 4. List survival probability at specific times, by living arrangement
sts list, surv at(0 90 180 365) by(notlivingtogether)

5. Interpretation Tips

6. Handling Censoring and Multiple Records

If each subject can have multiple records (such as with time-dependent covariates), always use the id() option to uniquely identify subjects:

stset day, id(woman_id) failure(remove)

This tells Stata which rows belong to the same individual.

7. Essentials for Practice and Exams