top of page

Time-to-Event Outcomes in Clinical Epidemiology: Foundations of Survival Analysis

  • Writer: Mayta
    Mayta
  • 3 days ago
  • 4 min read

1. Why survival analysis exists (the core idea)

In many clinical studies, the outcome is not just whether an event happens, but when it happens.

Examples:

  • Time to cancer

  • Time to death

  • Time to relapse

  • Time to device failure

Two complications immediately arise:

  1. People are followed for different lengths of time

  2. Some people never experience the event during observation

These two facts break ordinary methods like:

  • proportions

  • t-tests

  • logistic regression

Survival analysis exists to answer one question correctly:

“How does the risk of an event unfold over time, accounting for incomplete follow-up?”

2. The three ingredients of survival data (always)

Every survival dataset has three conceptual components:

  1. TimeHow long each individual is observed

  2. Event indicatorDid the event occur during that time?

  3. CensoringIf the event did not occur, observation stopped anyway

Censoring does not mean failure or success — it means:

“We simply don’t know what would have happened after this point.”

Survival analysis treats censored individuals as:

  • fully informative up to their censoring time

  • completely uninformative after

This is the philosophical shift most learners miss.

3. What stset really means (conceptually)

When we “declare” survival data in Stata, we are not running an analysis.

We are telling Stata:

“From now on, think in terms of risk sets over time, not rows of data.”

After this declaration:

  • Every time point matters

  • Who is still under observation matters

  • Events are interpreted relative to who was at risk at that moment

This is the foundation of everything that follows.

4. Risk sets: the hidden engine of survival analysis

At any time ( t ), there is a risk set:

The group of individuals who have not yet had the event and have not been censored.

All survival methods are built on this idea.

Every estimate answers some version of:

“Among those still at risk right now, what is happening?”

This is why survival analysis is dynamic, not static.

5. Kaplan–Meier curves: what they really show

A Kaplan–Meier curve is not just a line plot.

Conceptually, it answers:

“What proportion of people remain event-free beyond each point in time?”

Key ideas:

  • The curve steps down only when events occur

  • Censoring does not cause drops

  • Each step is conditional on surviving up to that point

So when you read a KM curve, you are reading:

“Given survival up to time ( t ), what fraction survive a little longer?”

This conditional thinking is essential.

6. Survival vs failure curves (a mental model)

Two equivalent perspectives:

  • Survival curve: “Who is still event-free?”

  • Failure (cumulative incidence) curve: “Who has experienced the event so far?”

They are mathematical complements, but psychologically different.

Clinically:

  • Survival curves emphasize protection

  • Failure curves emphasize burden

Both are descriptive. Neither is causal.

7. Median survival: what it means and why it disappears

The median survival time is:

The time when 50% of the population has experienced the event

Important consequences:

  • If fewer than 50% have events → median cannot be estimated

  • This is not an error

  • It reflects limited information

This teaches an important lesson:

Survival analysis reports what the data can support, not what we wish to see.

8. Comparing groups: what the log-rank test is really asking

The log-rank test does not compare proportions or medians.

Conceptually, it asks:

“At each event time, do the groups experience events in proportion to how many people are still at risk?”

At every failure time:

  • Stata calculates how many events each group should have had

  • assuming identical survival

  • given the current risk set

Observed events are then compared to these expectations across all times.

So the log-rank test is:

  • global

  • time-weighted

  • sensitive to persistent differences, not single early events


9. Why log-rank gives a p-value but no effect size

The log-rank test answers only one question:

“Is there evidence that the survival curves differ?”

It deliberately avoids estimating how much they differ.

This separation is intentional:

  • Testing ≠ estimation

  • Evidence ≠ magnitude

Effect size belongs to regression models.

10. The hazard: the most misunderstood concept

The hazard is not a probability.

It is best understood as:

The instantaneous event rate among people who have survived up to that moment

Think of it as speed, not chance.

Two groups may have:

  • the same overall incidence

  • but different hazards (one fails earlier)

Hazard focuses on timing, not totals.

11. Hazard ratio: what it tells us (precisely)

The hazard ratio compares hazards between groups:

“At any given moment, how much faster are events occurring in one group compared with another, among those still event-free?”

Key interpretations:

  • HR = 1 → no difference in event rate over time

  • HR > 1 → events occur faster in the exposed group

  • HR < 1 → events occur more slowly (protective)

Crucially:

  • HR is relative

  • HR is time-conditional

  • HR is not a risk ratio


12. Why Cox regression is central

Cox regression exists to estimate the hazard ratio without specifying the baseline hazard.

Conceptually, it says:

“I don’t need to know the absolute risk over time — only how groups compare at each moment.”

This is why Cox regression is:

  • robust

  • flexible

  • dominant in clinical research

But it comes with a price.

13. The proportional hazards assumption (concept, not test)

The Cox model assumes:

The ratio of hazards between groups is constant over time

This does not mean hazards are constant.It means their ratio is.

If this assumption fails:

  • a single HR is misleading

  • interpretation collapses

That is why assumption checking is not optional.

14. The full conceptual workflow (memory anchor)

  1. Define time, event, censoring→ What does “at risk” mean?

  2. Describe survival over time→ Kaplan–Meier curves

  3. Test whether curves differ→ Log-rank test

  4. Estimate how much they differ→ Cox hazard ratio

  5. Validate assumptions→ Proportional hazards

This order reflects thinking, not software.


15. Final conceptual takeaway

Survival analysis is about:

Risk evolving over time among people still under observation

Everything else — KM curves, log-rank tests, hazard ratios, Cox models — are tools to answer that single idea from different angles.

If you understands that, you understand survival analysis.

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page