← All posts

Internal vs External Validation in Clinical Prediction Models: Split-Sampling, Cross-Validation, Bootstrapping, Temporal, Geographic, Domain

Clinical Epidemiology ResearchUniqcret doctor knowledgesPrognosis [Methodology]Methodology and Research Design

🎯 WHY VALIDATION?

When building a Clinical Prediction Model (CPM), the biggest trap is overestimating its true performance. This happens because the model tends to “memorize” patterns in the training dataset that don't generalize to new patients.

To prevent misleading optimism, we use:

Let’s unpack them.


🔄 INTERNAL VALIDATION — Estimating Optimism

📌 DEFINITION

Internal validation means estimating how much your model’s apparent performance is inflated because it was tested on data it saw during development.

We simulate partially unseen data using only the original development dataset (no new data needed).


👇 THREE METHODS (and their pros/cons):


1️⃣ Split-Sampling (Holdout Validation)

How it works:

Then:

Optimism = Apparent performance − Test performance

🔥 Weaknesses:

🧠 CECS Verdict [6]: Only useful if dataset is large (>5,000 cases). Avoid in small samples.


2️⃣ Cross-Validation (CV)

e.g., k = 5 or 10 folds

How it works:

Optimism = Apparent performance − Mean test performance across folds

✅ Strengths:

⚠️ Limitations:


3️⃣ Bootstrapping ✅ (Best Practice per CECS [6])

How it works:

  1. Resample (with replacement) the full dataset (n patients) B = 500–1000 times
  2. For each resample:
    • Build the model
    • Evaluate on the same resample → Apparent performance
    • Then evaluate on original dataset → Test performance
    • Compute: Optimism = Apparent − Test
  3. Average optimism over B iterations
  4. Apply correction:Corrected Performance=Apparent (full model)−Average Optimism\text{Corrected Performance} = \text{Apparent (full model)} - \text{Average Optimism}

🧠 Why it works:

📌 Final Output:

Example:

MetricValue
Apparent AUROC (original)0.88
Bootstrap optimism0.06
Corrected AUROC0.82

🌍 EXTERNAL VALIDATION — Generalizability in the Wild

📌 DEFINITION

External validation tests the model on a truly unseen dataset, often from a different:


🧪 Why it’s essential

You can have a perfectly optimized model (AUROC 0.85 internal) that crashes in external settings (AUROC 0.62). Why?


Types of External Validation

TypeDataset SourceUse Case
TemporalLater time period in same hospitalValidates against time drift
GeographicDifferent hospital or regionValidates against setting shift
DomainDifferent disease spectrum or prevalenceValidates population transport

🧠 What to Measure During External Validation

  1. Discrimination
    • AUROC or C-statistic
    • How well can model rank high-risk vs low-risk patients?
  2. Calibration
    • Calibration slope = 1 → perfect
    • Intercept ≈ 0 → no global bias
    • Plots of predicted vs observed risk
  3. Clinical Utility
    • Decision Curve Analysis (DCA)
    • Net Benefit at decision thresholds

🔁 Big Picture Summary Table

DimensionInternal ValidationExternal Validation
Data SourceResampled development dataCompletely independent dataset
GoalEstimate optimism, control overfitTest transportability and generalizability
Key MethodsSplit-sample, k-fold CV, bootstrappingTemporal, geographic, domain validation
Output MetricCorrected AUROC, Calibration slopeExternal AUROC, Calibration slope, DCA
CECS VerdictBootstrap preferred [6]Essential before clinical use [6]

✅ TAKEAWAYS — What You Now Know

Comments

No comments yet. Be the first to share your thoughts.

Sign in to comment