Rates, Poisson Regression, and the Missing Lambda Backbone

Mayta
Jun 13, 2025
3 min read

Updated: Feb 7

Rates, Poisson regression, and the missing “λ backbone” (article-ready, blended)

In clinical epidemiology, quantifying the frequency of events such as complications, readmissions, and adverse effects is foundational. When patients are observed for different lengths of time, comparing simple proportions can mislead. In these settings, rate-based analysis is preferred because it accounts for time at risk (person-time), enabling fair comparisons across groups and allowing covariate adjustment. This is the natural home of Poisson regression for rates.

Risk vs rate

Risk (cumulative incidence)

Risk is the probability of an event in a defined population over a specified time window. It is a proportion bounded by 0–1.

Best when follow-up is uniform and complete (e.g., surgical site infection within 30 days).

Rate (incidence density)

Rate is the speed of occurrence, defined as events per unit person-time. It is a ratio and can exceed 1 when events are frequent.

Best when follow-up is incomplete or varies (e.g., bleeding episodes per 100 person-years on anticoagulation).

Design logic tie-in: rates are especially natural for dynamic (open) cohorts, where entry/exit and varying follow-up are expected.

The incidence rate framework (where λ comes first)

Before regression, start from the Poisson process idea:

λ (lambda) = event intensity = average number of events per unit time

If events follow a Poisson process, then the event count (Y) in a given observation window follows:

Y ∼ Poisson(λ)

with the defining property:

E(Y) = λ , Var(Y) = λ

Why μ appears in regression

In GLMs we typically write:

μ=E(Y)

For Poisson models:

μ≡λ

So μ is simply λ written as an expected value (notation changes; meaning doesn’t). This matters because it cleanly connects the rate concept to the count model.

From counts to rates via time-at-risk (the key decomposition)

In clinical rate data, each subject (or cluster) contributes a time-at-risk (t_i). The Poisson intensity decomposes as:

The offset (what it really does)

log(ti) is an offset: its coefficient is fixed at 1.
This forces the model to explain differences in rates, not just raw counts.

Interpretation

Each β is a log rate ratio.
exp(β) is an incidence rate ratio (IRR)—a multiplicative comparison of rates.

How to compute the rate denominator (two common frameworks)

1) Exact person-time (preferred)

Use when the individual follow-up time is known.

Example: Norplant removal
- Outcome: count of removals
- Denominator: time from insertion to removal/censoring
- Covariates: age, education

2) Approximated person-time (acceptable with justification)

Use when individual time is unknown or assumed uniform.

Example: neonatal mortality, assuming each infant contributes 28 days
- Outcome: neonatal deaths
- Denominator: 28 days per infant
- Covariates: birth weight, delivery method, maternal age

3) Cluster-level approximation (ecologic rate)

Use aggregated exposure time (e.g., tube-days per hospital-month).

Example: extubation failure rates across hospitals
- Outcome: failed extubations per month
- Denominator: monthly tube-days
- Covariates: staffing ratios, protocol adherence

(These are methodologically defensible if the time-at-risk definition matches the clinical process and measurement quality.)

Rate comparisons: ratio and difference

Rate Ratio (IRR/RR_rate): multiplicative comparison (Poisson regression default output).
Rate Difference (RD_rate): absolute excess events per person-time (often derived from model-based predicted rates or alternative links).

In causal/clinical reporting, pair relative and absolute metrics when possible to keep interpretation clinically grounded.

Implementation (Poisson rate model)

Stata (rate model with exposure)

poisson y x1 x2, exposure(time) irr
glm y x1 x2, family(poisson) link(log) exposure(time) irr
poisson y x1##x2, exposure(time) irr

R (GLM with offset)

glm(y ~ x1 + x2 + offset(log(time)), family = poisson(link="log"), data = df)

When Poisson “breaks”: overdispersion and why the Negative Binomial exists

Poisson assumes:

Var ( Yi ∣ Xi ) = μi

In real clinical data, unobserved heterogeneity makes true intensities vary:

This typically yields:

Var(Y) > E(Y)

Negative Binomial (NB): “λ becomes random.”

A standard way to model heterogeneity is:

NB keeps the same mean but allows extra variance:

Poisson is the special case α = 0.
Practical implication: if you ignore overdispersion, SEs can be too small and inference too optimistic.

(Clinically: whenever patients differ in baseline frailty/risk beyond measured covariates, overdispersion is more “default” than “rare”.)

Conclusion

Poisson regression models the expected count μ, which is the Poisson intensity λ, and becomes a rate model once μ is decomposed as rate × time-at-risk and log(t) is included as an offset. The exponentiated coefficients yield interpretable incidence rate ratios, enabling covariate-adjusted comparisons across unequal follow-up. When unmeasured heterogeneity causes overdispersion, Negative Binomial regression provides a principled extension by allowing λ to vary across individuals.

Key takeaways

Risk = probability over fixed time; rate = events per person-time (handles unequal follow-up).
λ is the backbone: in regression, μ = E(Y) = λ = t × rate .
Offset log(t) converts a count model into a rate model.
Overdispersion is common → consider Negative Binomial (or at least robust SEs).