Optimism in Clinical Prediction Models (CPMs) Apparent Performance = Test Performance + Optimism

Mayta
Oct 20, 2025
2 min read

🔍 Background

In the development of Clinical Prediction Models (CPMs)—tools that estimate a patient's risk of future events based on clinical features—researchers often report strong model performance when evaluated using the development dataset. However, these metrics can be misleadingly optimistic. This gap between perceived and true predictive ability is known as optimism.

📊 What Is Optimism?

Optimism in CPMs refers to the inflation of performance metrics (e.g., AUROC, calibration slope) when models are tested on the same data used to develop them. This leads to an overestimation of the model’s ability to generalize to new, unseen cases. It stems primarily from overfitting—where the model captures noise or patterns specific to the training data that do not apply elsewhere.

The relationship is defined as:

Apparent Performance = Test Performance + Optimism

Optimism = Apparent Performance − Test Performance

Where:

Apparent Performance is the model’s accuracy on the development dataset.
Test Performance approximates the model's predictive ability on external or future data.

🧠 Why Does This Matter?

Over-optimism misleads clinicians, researchers, and policymakers by portraying a model as more accurate than it truly is. This can lead to inappropriate patient management decisions when the model is applied in real-world clinical practice.

🛠 How to Quantify Optimism

Since we rarely have immediate access to large external datasets for validation, we use internal validation methods to estimate and adjust for optimism:

🧪 1. Split-Sample Validation

Data is divided into development and test sets.
Model is trained on the development set, tested on the holdout.
Optimism = Apparent – Test Performance.

⚠️ Limitations: In small datasets, this method reduces power and may yield unstable estimates.

🔁 2. Bootstrapping (Recommended)

Generate multiple bootstrap samples by sampling with replacement from the original dataset.
For each sample:
- Build a model and assess Apparent Performance (on bootstrap sample).
- Apply model to the original dataset to estimate Test Performance.
- Calculate optimism = apparent – test.
Repeat (e.g., 500 times), then average the optimism estimates.

Bootstrapping is preferred in CPM research with sample sizes <1,000 due to its efficiency and precision in estimating model optimism .

🧮 Adjusted Model Performance

To report a model's likely performance in real-world clinical use, subtract estimated optimism:

Adjusted Performance (Test) = Apparent Performance − Optimism

⚠️ Interpreting Large Optimism

If optimism is large (e.g., AUROC drops >0.05 when validated), it suggests overfitting, and that:

Predictors may be too many for the sample size.
The model is capturing spurious associations.
Recalibration or model simplification may be needed.

✅ Summary Points

Optimism quantifies the difference between overfitted performance and realistic performance.
Internal validation using bootstrapping is the gold standard for estimating optimism in CPMs.
Always report optimism-adjusted performance in manuscripts.
A high optimism estimate flags the risk of overfitting and undermines model reliability.
This concept is central to both prognostic and diagnostic CPM work, particularly during internal validation .