Time-to-Event Outcomes in Clinical Epidemiology: Foundations of Survival Analysis
- Mayta

- 3 days ago
- 4 min read

1. Why survival analysis exists (the core idea)
In many clinical studies, the outcome is not just whether an event happens, but when it happens.
Examples:
Time to cancer
Time to death
Time to relapse
Time to device failure
Two complications immediately arise:
People are followed for different lengths of time
Some people never experience the event during observation
These two facts break ordinary methods like:
proportions
t-tests
logistic regression
Survival analysis exists to answer one question correctly:
“How does the risk of an event unfold over time, accounting for incomplete follow-up?”
2. The three ingredients of survival data (always)
Every survival dataset has three conceptual components:
TimeHow long each individual is observed
Event indicatorDid the event occur during that time?
CensoringIf the event did not occur, observation stopped anyway
Censoring does not mean failure or success — it means:
“We simply don’t know what would have happened after this point.”
Survival analysis treats censored individuals as:
fully informative up to their censoring time
completely uninformative after
This is the philosophical shift most learners miss.
3. What stset really means (conceptually)
When we “declare” survival data in Stata, we are not running an analysis.
We are telling Stata:
“From now on, think in terms of risk sets over time, not rows of data.”
After this declaration:
Every time point matters
Who is still under observation matters
Events are interpreted relative to who was at risk at that moment
This is the foundation of everything that follows.
4. Risk sets: the hidden engine of survival analysis
At any time ( t ), there is a risk set:
The group of individuals who have not yet had the event and have not been censored.
All survival methods are built on this idea.
Every estimate answers some version of:
“Among those still at risk right now, what is happening?”
This is why survival analysis is dynamic, not static.
5. Kaplan–Meier curves: what they really show
A Kaplan–Meier curve is not just a line plot.
Conceptually, it answers:
“What proportion of people remain event-free beyond each point in time?”
Key ideas:
The curve steps down only when events occur
Censoring does not cause drops
Each step is conditional on surviving up to that point
So when you read a KM curve, you are reading:
“Given survival up to time ( t ), what fraction survive a little longer?”
This conditional thinking is essential.
6. Survival vs failure curves (a mental model)
Two equivalent perspectives:
Survival curve: “Who is still event-free?”
Failure (cumulative incidence) curve: “Who has experienced the event so far?”
They are mathematical complements, but psychologically different.
Clinically:
Survival curves emphasize protection
Failure curves emphasize burden
Both are descriptive. Neither is causal.
7. Median survival: what it means and why it disappears
The median survival time is:
The time when 50% of the population has experienced the event
Important consequences:
If fewer than 50% have events → median cannot be estimated
This is not an error
It reflects limited information
This teaches an important lesson:
Survival analysis reports what the data can support, not what we wish to see.
8. Comparing groups: what the log-rank test is really asking
The log-rank test does not compare proportions or medians.
Conceptually, it asks:
“At each event time, do the groups experience events in proportion to how many people are still at risk?”
At every failure time:
Stata calculates how many events each group should have had
assuming identical survival
given the current risk set
Observed events are then compared to these expectations across all times.
So the log-rank test is:
global
time-weighted
sensitive to persistent differences, not single early events
9. Why log-rank gives a p-value but no effect size
The log-rank test answers only one question:
“Is there evidence that the survival curves differ?”
It deliberately avoids estimating how much they differ.
This separation is intentional:
Testing ≠ estimation
Evidence ≠ magnitude
Effect size belongs to regression models.
10. The hazard: the most misunderstood concept
The hazard is not a probability.
It is best understood as:
The instantaneous event rate among people who have survived up to that moment
Think of it as speed, not chance.
Two groups may have:
the same overall incidence
but different hazards (one fails earlier)
Hazard focuses on timing, not totals.
11. Hazard ratio: what it tells us (precisely)
The hazard ratio compares hazards between groups:
“At any given moment, how much faster are events occurring in one group compared with another, among those still event-free?”
Key interpretations:
HR = 1 → no difference in event rate over time
HR > 1 → events occur faster in the exposed group
HR < 1 → events occur more slowly (protective)
Crucially:
HR is relative
HR is time-conditional
HR is not a risk ratio
12. Why Cox regression is central
Cox regression exists to estimate the hazard ratio without specifying the baseline hazard.
Conceptually, it says:
“I don’t need to know the absolute risk over time — only how groups compare at each moment.”
This is why Cox regression is:
robust
flexible
dominant in clinical research
But it comes with a price.
13. The proportional hazards assumption (concept, not test)
The Cox model assumes:
The ratio of hazards between groups is constant over time
This does not mean hazards are constant.It means their ratio is.
If this assumption fails:
a single HR is misleading
interpretation collapses
That is why assumption checking is not optional.
14. The full conceptual workflow (memory anchor)
Define time, event, censoring→ What does “at risk” mean?
Describe survival over time→ Kaplan–Meier curves
Test whether curves differ→ Log-rank test
Estimate how much they differ→ Cox hazard ratio
Validate assumptions→ Proportional hazards
This order reflects thinking, not software.
15. Final conceptual takeaway
Survival analysis is about:
Risk evolving over time among people still under observation
Everything else — KM curves, log-rank tests, hazard ratios, Cox models — are tools to answer that single idea from different angles.
If you understands that, you understand survival analysis.






Comments