Modified MAPE: Mean Absolute Prediction Error as a Prediction Instability Metric
- Mayta

- Mar 27
- 2 min read
Bootstrap-Based Assessment of Prediction Reproducibility


1. Concept Overview (Important Reset)
In this framework, MAPE is NOT a performance metric.
It does not compare prediction vs outcome.
Instead, it measures:
How much predictions change when the model is re-fitted on different bootstrap samples

2. Core Idea
You have:
Final model → Original predictions for all patients
Bootstrap models (B = 500) → Bootstrap predictions
MAPE quantifies:
The absolute difference between the original prediction and the bootstrap prediction for the same patient.

3. Step-by-Step Definition
Step 1: Fit Final Model
Fit model on full dataset (n = 3,134)
Generate:
for all patients
Step 2: Bootstrap Loop (b = 1 to 500)
For each bootstrap iteration:
2.1 Fit Bootstrap Model
Sample with replacement
Fit model on bootstrap sample
2.2 Predict on Original Data
Use bootstrap model to predict on original dataset
2.3 Identify Overlapping Patients
Because bootstrap sampling is with replacement:
Some patients appear in bootstrap sample
Some do not
Let S_b denote the set of patients included in bootstrap sample b.
Typically ~63% (~360 patients)
2.4 Compute MAPE for bootstrap b
Only for patients in (S_b):
4. Final MAPE
After 500 iterations:

5. Interpretation
This MAPE reflects:
Prediction instability across resampled datasets
Meaning:
Low MAPE → Predictions are stable → Model is robust
High MAPE → Predictions change a lot → Model is unstable / sensitive to sampling
6. Why Only Use Overlapping Patients (~63%)?
Because:
Those patients were used to train that bootstrap model
You are comparing:
“Prediction from full-data model” vs “Prediction from bootstrap model trained on (partly) same patients”
👉 This isolates model variability, not extrapolation

7. Why This is NOT Standard MAPE
Standard MAPE:
Your MAPE:
👉 Therefore:

8. Why Apparent MAPE Does NOT Exist Here
❗ “You cannot define apparent MAPE.”
Because:
There is no “truth” (no outcome involved)
Only comparing two models
So:
❌ No apparent
❌ No optimism correction
❌ No test vs train
This is purely:
model-to-model variability metric

9. Relation to Clinical Modeling
This metric evaluates:
Model reproducibility
Prediction stability
Sensitivity to sampling variation
It complements:
10. Key Insight (PhD-level)
This MAPE is essentially:
Bootstrap-based L1 distance between prediction functions
It answers:
“If I rebuild the model on slightly different data, how much do predictions change?”
11. Suggested Reporting Statement
“Prediction stability was assessed using a bootstrap-based Mean Absolute Prediction Error (MAPE), defined as the average absolute difference between predictions from the final model and bootstrap-refitted models across overlapping individuals. Lower values indicate greater model stability.”
Final Takeaway
Your school’s MAPE = prediction instability metric
It is not MAE vs outcome
It has:
❌ no apparent version
❌ no optimism correction
It measures:
✅ robustness
✅ reproducibility



Comments