Classical MAPE internal validation vs Modified MAPE instability for Prediction Instability in Clinical Models

Abstract

The term Mean Absolute Prediction Error (MAPE) is used across different scientific domains, often with distinct meanings. In classical statistical and forecasting literature, MAPE quantifies the discrepancy between predicted and observed outcomes. In contrast, emerging approaches in clinical prediction modeling use a modified form of MAPE to assess prediction stability across resampled datasets. This article clarifies the conceptual, mathematical, and methodological differences between these two definitions, emphasizing their distinct purposes and appropriate contexts of use.

1. Introduction

Evaluation of predictive models requires careful selection of performance metrics aligned with the scientific question of interest. In clinical epidemiology, commonly used metrics include:

Discrimination (e.g., AUROC)
Calibration (e.g., slope and intercept)
Overall accuracy (e.g., Brier score)

The term MAPE appears in this landscape but may refer to fundamentally different constructs depending on context. Failure to distinguish these meanings can lead to conceptual confusion and misinterpretation of results.

This article distinguishes between:

Classical MAPE – a measure of prediction accuracy relative to observed outcomes
Modified MAPE – a measure of prediction instability across bootstrap-refitted models

2. Classical MAPE: Accuracy-Based Definition

2.1 Definition

Classical MAPE is defined as:

MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} |

where:

yi is the observed outcome
y^i is the predicted value

2.2 Conceptual Meaning

Classical MAPE answers the question:

How far are predictions from the true observed values?

It is therefore a measure of prediction accuracy.

2.3 Domain of Use

Time-series forecasting
Econometrics
Engineering and business analytics

2.4 Limitations in Clinical Prediction

In clinical models with binary outcomes ( y\in{0,1} ):

The denominator yi becomes problematic when yi=0
The metric becomes undefined or unstable

For this reason, classical MAPE is rarely used in clinical prediction modeling.

3. Modified MAPE: Prediction Instability Definition

3.1 Definition

In the modified framework, MAPE is defined as:

MAPE = \frac{1}{B} \sum_{b = 1}^{B} (\frac{1}{| S_{b} |} \sum_{i \in S_{b}} | {\hat{p}}_{i}^{o r i g} - {\hat{p}}_{i}^{b o o t (b)} |)

where:

p^iorig = prediction from the final model
p^iboot(b) = prediction from bootstrap model b
Sb = set of individuals included in bootstrap sample b
B = number of bootstrap iterations

3.2 Conceptual Meaning

This modified MAPE answers a fundamentally different question:

How much do predictions change when the model is re-estimated on resampled data?

Thus, it is a measure of:

Prediction stability
Model reproducibility
Sensitivity to sampling variation

3.3 Interpretation

Low MAPE → stable predictions across bootstrap samples
High MAPE → predictions vary substantially; model may be unstable

3.4 Key Properties

Does not require observed outcomes
Does not assess accuracy
Does not have apparent or optimism-corrected versions
Relies fundamentally on bootstrap resampling

4. Conceptual Differences

4.1 What is being compared

Metric	Comparison
Classical MAPE	Prediction vs observed outcome
Modified MAPE	Prediction vs prediction (between models)

4.2 Scientific Question

Metric	Question
Classical MAPE	“Is the model correct?”
Modified MAPE	“Is the model stable?”

4.3 Role in Model Evaluation

Aspect of model quality	Metric
Discrimination	AUROC
Calibration	Slope, intercept
Accuracy	Brier score
Stability	Modified MAPE

5. Why the Same Name?

The shared terminology arises from a common mathematical structure:

mean (| A - B |)

However, the interpretation depends entirely on:

What A represents
What B represents

In classical MAPE:

A=y , B=y^

In modified MAPE:

A=p^orig , B=p^boot

Thus, the same mathematical form is applied to different scientific objects, leading to different meanings.

6. Implications for Clinical Research

The modified MAPE provides information not captured by traditional metrics:

It evaluates robustness of predictions, not correctness
It is particularly useful in assessing overfitting indirectly
It complements, rather than replaces, standard performance metrics

However, due to non-standard terminology, it is essential to define the metric explicitly in any report or publication.

7. Recommended Reporting

To avoid ambiguity, the following wording is recommended:

“Prediction stability was assessed using a bootstrap-based Mean Absolute Prediction Error (MAPE), defined as the mean absolute difference between predicted probabilities from the final model and those from bootstrap-refitted models across overlapping individuals.”

8. Conclusion

Although sharing the same name, classical MAPE and modified MAPE represent fundamentally different concepts. Classical MAPE evaluates accuracy relative to observed outcomes, while modified MAPE evaluates stability of predictions across resampled models. Recognizing this distinction is essential for correct interpretation and appropriate application in clinical prediction research.