top of page

Survival Analysis in Clinical Epidemiology Stata Code dominant: From Kaplan–Meier to Cox Regression (Non-parametric, Semi-parametric)

Introduction

Survival analysis is used when time matters.

We are not only interested in whether an event happens, but also when it happens, and we must correctly handle censoring (patients who do not experience the event during follow-up).

1. Non-parametric Survival Analysis

(Describe and compare survival — no model assumptions)

Step 1: Survival-time setting (stset)

Before doing anything, we must tell Stata:

  • What is time

  • What is event

  • Who is censored

stset time, failure(event==1)

👉 This creates the survival data structure. 👉 Everything in survival analysis starts here.

Step 2: Life table analysis (sts list)

sts list
sts list, by(group)

What this does:

  • Divides time into intervals

  • Shows:

    • Number at risk

    • Number of events

    • Survival probability

✅ Good for:

  • Understanding risk sets

  • Teaching how censoring works

📌 Conceptually:

“How many patients are still under observation at each time?”

Step 3: Kaplan–Meier curves (sts graph)

Kaplan–Meier is a graphical form of sts list.

Survival curve

sts graph, survival
sts graph, survival by(group)

Shows:

  • Probability of remaining event-free over time

Failure curve (cumulative incidence)

sts graph, failure
sts graph, failure by(group)

Shows:

  • Proportion who have had the event by time t

📌 Important:

  • These curves are data-driven

  • They correctly handle censoring

  • No covariate adjustment

Step 4: Log-rank test (sts test)

sts test group

What it does:

  • Compares multiple Kaplan–Meier curves

  • Gives a p-value

Interpretation:

“Are the survival curves statistically different?”

✅ You can compare:

  • 2 groups

  • 3 groups

  • Many groups(one p-value overall)

❌ What it does NOT do:

  • Does not give effect size

  • Does not give hazard ratio

Summary: Non-parametric methods

Method

Purpose

stset

Define time & event

sts list

Life table (numbers at risk)

sts graph

Kaplan–Meier curves

sts test

Compare curves (p-value only)


2. Semi-parametric Survival Analysis

Cox Proportional Hazards Regression (stcox)

Now we move from description to effect estimation.

What Cox regression tells us

stcox exposure

Cox regression answers:

“How much higher or lower is the risk over time?”

It reports a Hazard Ratio (HR).

Interpretation of Hazard Ratio

HR

Meaning

HR = 1

No difference

HR > 1

Higher risk

HR < 1

Lower (protective) risk

Example interpretation:

“Cryosurgery reduced the hazard of recurrence by 52% (HR 0.48).”

Adjusted Cox regression (control confounders)

stcox exposure age sex i.other

This answers:

“What is the effect of exposure after adjusting for other factors?”

📌 Key strength of Cox:

  • Handles censoring

  • Uses time-to-event

  • Adjusts for covariates


Relationship to non-parametric methods

Method

Question answered

Kaplan–Meier

What does survival look like?

Log-rank

Are curves different?

Cox

How large is the effect?


3. Full Parametric Survival Models

(Know they exist — no detail needed)

Examples:

  • Exponential

  • Weibull

  • Log-normal

  • Log-logistic

In Stata:

streg exposure, dist(weibull)

Key idea:

  • Assumes a specific shape for the hazard

  • More assumptions than Cox

📌 For beginners:

Just know that parametric models exist You do not need details at this stage

4. Flexible Parametric Survival Models

(Know they exist — no detail needed)

Examples:

  • Royston–Parmar models

  • Splines for baseline hazard

Purpose:

  • More flexible hazard shapes

  • Used in advanced research

📌 For now:

“These are advanced extensions of survival models.”

No need to go deeper for introductory learning.

Big Picture Summary (Very Important)

Level

Method

What it does

Non-parametric

sts

Describe & compare survival

Semi-parametric

stcox

Estimate hazard ratios

Parametric

streg

Model survival with assumptions

Flexible parametric

Advanced

Complex hazard shapes


One-sentence takeaway (exam-ready)

Kaplan–Meier describes survival, log-rank tests differences, Cox regression estimates effects, and parametric models add assumptions.

Recent Posts

See All
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page