← All posts

Penalisation and Regularisation in Clinical Prediction Models Explained (CPMs): Why “shrinkage” not “swelling.”

Clinical Epidemiology ResearchUniqcret doctor knowledgesMethodology and Research DesignDiagnosis [Methodology]Prognosis [Methodology]

Introduction

Regularisation (also called penalisation or shrinkage) is a modelling strategy used when developing CPMs to reduce overfitting and improve performance in new patients. In practice, it does this by discouraging overly large coefficients—i.e., it makes “complexity” costly.

In CPM development roadmaps, penalisation is positioned within Model Derivation (model fitting) as an alternative to traditional variable selection approaches, particularly when the number of candidate predictors is large relative to sample size (high p/n).


The core math idea: “fit well” + “pay a complexity tax.”

1) Ordinary (unpenalised) model fitting

Model fitting means choosing coefficients (\beta) to minimise a loss function.

Linear regression (RSS loss):

min β i=1 n ( yi β0 j=1 p x ij βj ) 2

Logistic regression (negative log-likelihood loss) — common in CPMs with a binary outcome:

min β i=1 n ( yi log ( πi ) + ( 1yi ) log ( 1πi ) )

where

πi = Pr ( Yi = 1 | Xi ) ,  and typically  logit ( πi ) = β0 + j=1 p x ij βj

2) Penalised (regularised) model fitting

Regularisation modifies the objective by adding a penalty term:

min β Loss (β) + λ Penalty (β)

(Common convention: the intercept (β₀) is not penalised.)


Continuous and categorical predictors: what “coefficients” really mean

Regularisation acts on coefficients, so it’s essential to understand how predictors create coefficients.

Continuous predictors

A continuous predictor usually contributes one coefficient if entered linearly (e.g., age). But CPM guidance warns against dichotomising continuous variables (e.g., “age ≥ 65”) because it throws away information; instead, consider flexible forms such as splines/polynomials when needed.

Categorical predictors (3+ levels)

A categorical predictor typically creates multiple coefficients through indicator (dummy) coding.

If (Z) has (K) categories, choosing one reference category produces (K-1) dummy variables, hence (K-1) coefficients for that single predictor. In CPM practice, rare categories should often be combined to avoid sparse data problems.

So: regularisation doesn’t penalise “a variable” in the abstract—it penalises the set of coefficients created by how that variable is encoded.

Suppose you have a categorical predictor Z with K categories, e.g.

Z ∈ {1,2,…,K}

To use it in regression, you encode it into dummy/indicator variables (one-hot encoding with a reference category).

Reference-category (most common)

Pick a reference level (say level 1). Create K−1dummies:

x i2 = 1 ( Zi = 2 ) , x i3 = 1 ( Zi = 3 ) , , x iK = 1 ( Zi = K )

Then the linear predictor becomes:

ηi = β0 + β2 x i2 + β3 x i3 + + βK x iK + (other predictors)

So one categorical predictor with K levels contributes K−1 coefficients.

Interpretation in logistic regression

For a binary outcome, CPM (logistic regression):

logit ( πi ) = ηi

Each βk\beta_kβk​ is the log-odds difference for category kkk vs the reference category. Odds ratio for level kkk vs reference is:

OR k = e βk

Ridge (L2) regularisation

Objective function

min β Loss (β) + λ j=1 p βj 2

What L2 does conceptually


LASSO (L1) regularisation

Objective function

min β Loss (β) + λ j=1 p | βj |

What L1 does conceptually

Important for categorical predictors: because a factor with (K) levels has (K-1) dummy coefficients, standard LASSO can set some level-coefficients to 0 while leaving others nonzero. That may be fine, but it means selection may occur at the level rather than the whole variable level.

Penalisation (including LASSO) is explicitly highlighted as an option for variable selection during model derivation in CPM workflows.


Elastic Net (L1 + L2): best of both worlds

Objective function

A common parameterisation uses α ∈ [0,1] :

min β Loss (β) + λ ( α j=1 p | βj | + (1α) j=1 p βj 2 )

Why it exists


Where this fits in the CPM methodology

In the CPM development roadmap:


Before we go


Penalisation / Regularization

Penalisation / Regularization

Penalisation (also called regularization or shrinkage) means adding a penalty term to the model during fitting. Modern clinical prediction models often recommend this approach because it helps control overfitting and makes the model more stable and generalizable, even when you have many predictors.


Regularisation (Shrinkage) — what it does

Regularisation adds a penalty to the optimisation target, so the model is not rewarded for using very large coefficients.

A tuning parameter λ (lambda) controls how strong the penalty is:

In practice, you choose λ using cross-validation (try multiple λ values and keep the one that predicts best on unseen folds).


Three main regularisation methods (no equations)

1) Ridge Regression (L2 penalty)

2) LASSO Regression (L1 penalty)

3) Elastic Net


Football manager analogy (same style as before)

Imagine you are a football manager building a team from hundreds of players (your predictors). Your goal is not to win one friendly match (fit the training data), but to win the whole season (generalise to new patients).

Ordinary regression (no penalty)

“You can buy anyone at any price as long as you win today.”

Ridge (L2)

“You may sign everyone, but there’s a luxury tax that grows rapidly for expensive stars.”

LASSO (L1)

“Every player costs a fixed registration fee—same fee per person.”

Elastic Net

“You have both rules: a registration fee and a luxury tax.”


Why “shrinkage” (not “swelling”)?

Because in CPM development, the default problem is that coefficients already tend to be too large / too extreme when you fit a model on limited data with many predictors. That “coefficient inflation” is basically what overfitting looks like in regression.

Here’s the logic in plain:

The “bias–variance trade” in one sentence

Shrinkage works because it adds a bit of bias to cut variance a lot, which usually reduces overall prediction error on new patients.

Why not just “boost/increase” coefficients instead?

Because increasing coefficients would usually:

How do we decide how much to shrink?

You don’t pick it by feeling—you tune the penalty strength (λ) using internal validation, commonly cross-validation (or bootstrapping).

Bottom line: In CPMs, the usual enemy is inflated (swollen) coefficients from overfitting, so the fix is shrinkage, not swelling.

Penalisation and Regularisation in Clinical Prediction Models Explained (CPMs): Why “shrinkage” not “swelling.” — Uniqcret