Overview

Building a prediction model requires separating three distinct stages, each answering a different methodological question:

Step	Goal	Key Question
1. Hyperparameter tuning	Model selection	Which model generalizes best?
2. Final model fitting	Model estimation	What is the final model?
3. Internal validation	Performance estimation	How much am I overestimating performance?

Failure to separate these steps leads to biased and non-reproducible results .

1. Hyperparameter Tuning

Objective

Select the model configuration that maximizes performance on unseen data:

Best model = arg max_λ [Cross-validated performance]

Recommended Method: Cross-Validation

Mechanism

Split data into K folds
Train on K−1 folds
Test on the remaining fold
Repeat across folds
Average performance

Interpretation

Property	Meaning
Train/test separation	Mimics external validation
Bias	Slightly pessimistic
Advantage	Prevents overfitting during model selection

Why This Matters

Hyperparameter tuning is a selection problem, not a final performance estimate.

The goal is:

“Which model will perform best on new patients?”

Cross-validation directly estimates this.

This aligns with prediction modeling principles emphasizing generalizability during development.

What Should NOT Be Done

Do not use bootstrap for tuning
Do not use apparent (training) performance

Reason:

These methods are optimistically biased
They overestimate model performance

2. Fit Final Model

Objective

After selecting optimal hyperparameters:

Fit the final model using the entire dataset

Why Full Data is Used

Approach	Consequence
Use full dataset	Maximum statistical power
Use subset (e.g., CV folds)	Loss of information

Conceptual Role

This step defines your final prediction model:

Final coefficients (if regression-based)
Final tree structure (if Random Forest)
Final prediction function

Important Clarification

This model is not yet validated.

Its performance is still:

Apparentperformance = Trueperformance + Optimism

3. Internal Validation

Objective

Estimate and correct for overfitting:

Trueperformance = Apparentperformance - Optimism

Two Valid Approaches

Option A: Cross-Validation

Mechanism

Refit model across folds
Evaluate performance on held-out data
Average results

Properties

Property	Interpretation
Bias	Slightly pessimistic
Data usage	Less efficient (not full data per model)
Simplicity	Easy to implement

Option B: Bootstrap (Preferred for CPM)

Mechanism (Optimism Correction)

Fit model on full dataset → Apparent performance
Draw bootstrap sample
Fit model on bootstrap sample
Evaluate:
- On bootstrap sample (training)
- On original dataset (testing)
Compute optimism:

Optimism = {Performance}_{train} - {Performance}_{test}

Repeat many times
Correct:

Correctedperformance = Apparent - Meanoptimism

Properties

Property	Interpretation
Data usage	Uses full dataset
Bias correction	Directly estimates optimism
Output	Optimism-corrected performance

Why Bootstrap is Strong

Bootstrap directly answers:

“How much am I overfitting my dataset?”

This follows the core modeling principle:

Separate signal from bias and random error

Putting It All Together

Complete Pipeline

Step 1 — Hyperparameter tuning

Use cross-validation
Select best model configuration

Step 2 — Fit final model

Train model on full dataset
Fix model parameters

Step 3 — Internal validation

Use bootstrap (preferred) or cross-validation
Report:
- Apparent performance
- Corrected performance

Conceptual Separation (Critical Insight)

Stage	What is being estimated
Tuning	Generalization across models
Final model	Best model representation
Validation	Bias (optimism) in performance

Key Insight

If these steps are not separated:

Model selection and validation become entangled
Performance is overestimated
Results are not reproducible

Clinical Interpretation

Step	Clinical meaning
Tuning	“Which model works best for new patients?”
Final model	“This is the model I will use”
Validation	“How much am I overestimating its performance?”

Key Takeaways

Hyperparameter tuning, model fitting, and validation answer different questions
Cross-validation is required for model selection
Final model must be trained on the full dataset
Internal validation must correct for optimism
Bootstrap is preferred for estimating optimism in clinical prediction models
Proper separation of steps is essential for valid and publishable results

Machine learn Model Development Pipeline — Tuning, Final Model, and Internal Validation

Overview

1. Hyperparameter Tuning

Objective

Recommended Method: Cross-Validation

Mechanism

Interpretation

Why This Matters

What Should NOT Be Done

2. Fit Final Model

Objective

Why Full Data is Used

Conceptual Role

Important Clarification

3. Internal Validation

Objective

Two Valid Approaches

Option A: Cross-Validation

Mechanism

Properties

Option B: Bootstrap (Preferred for CPM)

Mechanism (Optimism Correction)

Properties

Why Bootstrap is Strong

Putting It All Together

Complete Pipeline

Step 1 — Hyperparameter tuning

Step 2 — Fit final model

Step 3 — Internal validation

Conceptual Separation (Critical Insight)

Key Insight

Clinical Interpretation

Key Takeaways

Comments