Modified MAPE: Mean Absolute Prediction Error as a Prediction Instability Metric

Bootstrap-Based Assessment of Prediction Reproducibility

1. Concept Overview (Important Reset)

In this framework, MAPE is NOT a performance metric.

It does not compare prediction vs outcome.

Instead, it measures:

How much predictions change when the model is re-fitted on different bootstrap samples

2. Core Idea

You have:

Final model → Original predictions for all patients
Bootstrap models (B = 500) → Bootstrap predictions

MAPE quantifies:

The absolute difference between the original prediction and the bootstrap prediction for the same patient.

3. Step-by-Step Definition

Step 1: Fit Final Model

Fit model on full dataset (n = 3,134)
Generate:

{\hat{p}}_{i}^{o r i g}

for all patients

Step 2: Bootstrap Loop (b = 1 to 500)

For each bootstrap iteration:

2.1 Fit Bootstrap Model

Sample with replacement
Fit model on bootstrap sample

2.2 Predict on Original Data

Use bootstrap model to predict on original dataset

{\hat{p}}_{i}^{b o o t (b)}

2.3 Identify Overlapping Patients

Because bootstrap sampling is with replacement:

Some patients appear in bootstrap sample
Some do not

Let S_b denote the set of patients included in bootstrap sample b.

Typically ~63% (~360 patients)

2.4 Compute MAPE for bootstrap b

Only for patients in (S_b):

MAPE (b) = \frac{1}{| S_{b} |} \sum_{i \in S_{b}} | {\hat{p}}_{i}^{o r i g} - {\hat{p}}_{i}^{b o o t (b)} |

4. Final MAPE

After 500 iterations:

FinalMAPE = \frac{1}{B} \sum_{b = 1}^{B} MAPE (b)

5. Interpretation

This MAPE reflects:

Prediction instability across resampled datasets

Meaning:

Low MAPE → Predictions are stable → Model is robust
High MAPE → Predictions change a lot → Model is unstable / sensitive to sampling

6. Why Only Use Overlapping Patients (~63%)?

Because:

Those patients were used to train that bootstrap model
You are comparing:

“Prediction from full-data model” vs “Prediction from bootstrap model trained on (partly) same patients”

👉 This isolates model variability, not extrapolation

7. Why This is NOT Standard MAPE

Standard MAPE:

| \hat{y} - y |

Your MAPE:

| {\hat{p}}^{o r i g} - {\hat{p}}^{b o o t} |

👉 Therefore:

Type	Measures
Standard MAPE	prediction vs truth
This MAPE	prediction vs prediction

8. Why Apparent MAPE Does NOT Exist Here

❗ “You cannot define apparent MAPE.”

Because:

There is no “truth” (no outcome involved)
Only comparing two models

So:

❌ No apparent
❌ No optimism correction
❌ No test vs train

This is purely:

model-to-model variability metric

9. Relation to Clinical Modeling

This metric evaluates:

Model reproducibility
Prediction stability
Sensitivity to sampling variation

It complements:

Domain	Metric
Discrimination	AUROC
Calibration	slope / intercept
Accuracy	Brier
Stability	MAPE (this definition)

10. Key Insight (PhD-level)

This MAPE is essentially:

Bootstrap-based L1 distance between prediction functions

It answers:

“If I rebuild the model on slightly different data, how much do predictions change?”

11. Suggested Reporting Statement

“Prediction stability was assessed using a bootstrap-based Mean Absolute Prediction Error (MAPE), defined as the average absolute difference between predictions from the final model and bootstrap-refitted models across overlapping individuals. Lower values indicate greater model stability.”

Final Takeaway

Your school’s MAPE = prediction instability metric
It is not MAE vs outcome
It has:
- ❌ no apparent version
- ❌ no optimism correction
It measures:
- ✅ robustness
- ✅ reproducibility