top of page

Machine learn Model Development Pipeline — Tuning, Final Model, and Internal Validation

Overview

Building a prediction model requires separating three distinct stages, each answering a different methodological question:

Failure to separate these steps leads to biased and non-reproducible results .


1. Hyperparameter Tuning

Objective

Select the model configuration that maximizes performance on unseen data:


Recommended Method: Cross-Validation

Mechanism

  • Split data into K folds

  • Train on K−1 folds

  • Test on the remaining fold

  • Repeat across folds

  • Average performance


Interpretation


Why This Matters

Hyperparameter tuning is a selection problem, not a final performance estimate.

The goal is:

“Which model will perform best on new patients?”

Cross-validation directly estimates this.

This aligns with prediction modeling principles emphasizing generalizability during development.


What Should NOT Be Done

  • Do not use bootstrap for tuning

  • Do not use apparent (training) performance

Reason:

  • These methods are optimistically biased

  • They overestimate model performance


2. Fit Final Model

Objective

After selecting optimal hyperparameters:

Fit the final model using the entire dataset


Why Full Data is Used


Conceptual Role

This step defines your final prediction model:

  • Final coefficients (if regression-based)

  • Final tree structure (if Random Forest)

  • Final prediction function


Important Clarification

This model is not yet validated.

Its performance is still:


3. Internal Validation

Objective

Estimate and correct for overfitting:


Two Valid Approaches


Option A: Cross-Validation

Mechanism

  • Refit model across folds

  • Evaluate performance on held-out data

  • Average results


Properties


Option B: Bootstrap (Preferred for CPM)

Mechanism (Optimism Correction)

  1. Fit model on full dataset → Apparent performance

  2. Draw bootstrap sample

  3. Fit model on bootstrap sample

  4. Evaluate:

    • On bootstrap sample (training)

    • On original dataset (testing)

  5. Compute optimism:

  1. Repeat many times

  2. Correct:


Properties


Why Bootstrap is Strong

Bootstrap directly answers:

“How much am I overfitting my dataset?”

This follows the core modeling principle:

Separate signal from bias and random error


Putting It All Together

Complete Pipeline

Step 1 — Hyperparameter tuning

  • Use cross-validation

  • Select best model configuration


Step 2 — Fit final model

  • Train model on full dataset

  • Fix model parameters


Step 3 — Internal validation

  • Use bootstrap (preferred) or cross-validation

  • Report:

    • Apparent performance

    • Corrected performance



Conceptual Separation (Critical Insight)


Key Insight

If these steps are not separated:

  • Model selection and validation become entangled

  • Performance is overestimated

  • Results are not reproducible


Clinical Interpretation


Key Takeaways

  • Hyperparameter tuning, model fitting, and validation answer different questions

  • Cross-validation is required for model selection

  • Final model must be trained on the full dataset

  • Internal validation must correct for optimism

  • Bootstrap is preferred for estimating optimism in clinical prediction models

  • Proper separation of steps is essential for valid and publishable results

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page