Machine learn Model Development Pipeline — Tuning, Final Model, and Internal Validation
- Mayta

- Mar 27
- 2 min read
Overview
Building a prediction model requires separating three distinct stages, each answering a different methodological question:
Failure to separate these steps leads to biased and non-reproducible results .


1. Hyperparameter Tuning
Objective
Select the model configuration that maximizes performance on unseen data:
Recommended Method: Cross-Validation
Mechanism
Split data into K folds
Train on K−1 folds
Test on the remaining fold
Repeat across folds
Average performance
Interpretation
Why This Matters
Hyperparameter tuning is a selection problem, not a final performance estimate.
The goal is:
“Which model will perform best on new patients?”
Cross-validation directly estimates this.
This aligns with prediction modeling principles emphasizing generalizability during development.
What Should NOT Be Done
Do not use bootstrap for tuning
Do not use apparent (training) performance
Reason:
These methods are optimistically biased
They overestimate model performance

2. Fit Final Model
Objective
After selecting optimal hyperparameters:
Fit the final model using the entire dataset
Why Full Data is Used
Conceptual Role
This step defines your final prediction model:
Final coefficients (if regression-based)
Final tree structure (if Random Forest)
Final prediction function
Important Clarification
This model is not yet validated.
Its performance is still:

3. Internal Validation
Objective
Estimate and correct for overfitting:
Two Valid Approaches
Option A: Cross-Validation
Mechanism
Refit model across folds
Evaluate performance on held-out data
Average results
Properties
Option B: Bootstrap (Preferred for CPM)
Mechanism (Optimism Correction)
Fit model on full dataset → Apparent performance
Draw bootstrap sample
Fit model on bootstrap sample
Evaluate:
On bootstrap sample (training)
On original dataset (testing)
Compute optimism:
Repeat many times
Correct:
Properties
Why Bootstrap is Strong
Bootstrap directly answers:
“How much am I overfitting my dataset?”
This follows the core modeling principle:
Separate signal from bias and random error

Putting It All Together
Complete Pipeline
Step 1 — Hyperparameter tuning
Use cross-validation
Select best model configuration
Step 2 — Fit final model
Train model on full dataset
Fix model parameters
Step 3 — Internal validation
Use bootstrap (preferred) or cross-validation
Report:
Apparent performance
Corrected performance

Conceptual Separation (Critical Insight)
Key Insight
If these steps are not separated:
Model selection and validation become entangled
Performance is overestimated
Results are not reproducible
Clinical Interpretation
Key Takeaways
Hyperparameter tuning, model fitting, and validation answer different questions
Cross-validation is required for model selection
Final model must be trained on the full dataset
Internal validation must correct for optimism
Bootstrap is preferred for estimating optimism in clinical prediction models
Proper separation of steps is essential for valid and publishable results



Comments