top of page

Reporting Performance and Stability in TRIPOD+AI & Riley Framework Clinical Prediction Models: A Stata-Centered Code+Framework

  • Writer: Mayta
    Mayta
  • 13 hours ago
  • 3 min read

This article integrates the TRIPOD+AI reporting standards with the latest Riley/Collins/Ensor stability framework. It shifts the focus from just "average" performance to "individual" reliability using the pm-suite in Stata.

Introduction

In modern clinical prediction, showing that a model is "accurate on average" is no longer enough. Under TRIPOD+AI, you must report both Performance (how well the model works for the population) and Stability (how much an individual’s risk estimate changes if the training data were slightly different).

1. The Core Distinction

  • Performance (Average): "On average, how close are we to the truth?" (AUC, Calibration, Net Benefit).

  • Stability (Reliability): "If I re-ran this study with a different sample, would this specific patient get the same risk score?"


2. The 8 Required Outputs

To be fully compliant with the Riley/Collins logic, your results section should include the following three pillars:

Pillar A: Performance (Population-Average)

  1. ROC Curve / C-statistic: Can the model separate cases from non-cases?

  2. Calibration Plot: Does the predicted risk match observed risk across the spectrum?

  3. Decision Curve Analysis (DCA): Does the model provide higher Net Benefit than "treat all" or "treat none" at clinical thresholds?

Pillar B: Stability (Individual & Decision Reliability)

  1. Prediction Instability Plot: Visualizes the "wiggle" of individual risks across bootstrap re-fits.

  2. Average MAPE (Stability Index): The mean absolute difference between original and bootstrap risks. Target: $< 0.02$ (context-dependent).

  3. 95% Uncertainty Interval (UI): The range (2.5th to 97.5th percentile) of risk for a single patient across re-fits.

  4. Classification Instability Plot: Shows "threshold flipping"—how often a patient moves from "low risk" to "high risk" across model developments.

Pillar C: Stability (Population-Level)

  1. Calibration Instability Plot: A "spaghetti plot" of calibration curves from bootstrap re-fits to show if the model's reliability is volatile.


3. The Stata Toolchain: pm-suite

The authoritative tools for this workflow are maintained by Joie Ensor and the Riley/Collins team.

Installation

Stata

* Performance & Utilities
ssc install pmcalplot, replace
net install dca, from("https://raw.github.com/ddsjoberg/dca.stata/master/") replace

* The Stability Suite (Riley/Ensor)
net from https://joieensor.github.io/pm-suite/
net install pmstabilityplots, replace
net install pmstabilityss, replace  // For sample size planning

Mapping Requirements to Commands

Requirement

Stata Command

Key Output

Calibration

pmcalplot

Observed vs. Predicted

Clinical Utility

dca

Net Benefit

Individual Stability

pmstabilityplots

Prediction Instability Plot & MAPE

Decision Stability

pmstabilityplots

Classification Instability

Calib. Stability

pmstabilityplots

Spaghetti Calibration Curves


4. Implementation Workflow

Step 1: Fit and Assess Performance

Stata

logistic outcome x1 x2 x3
predict p_app, pr

* Standard Performance
lroc
pmcalplot p_app outcome, count
dca outcome p_app, xstop(0.5)

Step 2: Assess Stability (The Riley Method)

Using pmstabilityplots automates the bootstrap re-development process. It re-estimates the model parameters multiple times to see how much the individual predictions change.

Stata

* Stability Assessment (e.g., 200 bootstrap reps)
* 'threshold' defines the point for Classification Instability
pmstabilityplots outcome x1 x2 x3, reps(200) threshold(0.2)

This command generates the three critical figures:

  1. Prediction Instability Plot: Highlighting the MAPE and 95% UIs.

  2. Classification Instability: Visualizing how many patients cross the 20% risk threshold.

  3. Calibration Instability: Showing the variation in the calibration intercept and slope.


5. Reporting Template (Methods Section)

"Model performance was evaluated via discrimination (C-statistic), calibration (calibration plots), and clinical utility (Decision Curve Analysis). To ensure the reliability of individual-level predictions, we performed a stability analysis according to the Riley/Collins framework. We quantified prediction instability using Mean Absolute Prediction Error (MAPE) and individual 95% uncertainty intervals (UI). Decision stability was assessed via classification instability plots at a clinical threshold of [X%]. All stability analyses were performed in Stata using the pmstabilityplots package (pm-suite), involving [200] bootstrap re-development cycles."

6. R Crosswalk: pminternal

If collaborating with R users, the pminternal package provides the exact same framework:

Item

R Function (pminternal)

Prediction Instability

prediction_stability()

Stability Index (MAPE)

mape_stability()

Decision Stability

dcurve_stability()

Calibration Stability

calibration_stability()

Would you like me to generate a mock Results table showing how to present the MAPE and 95% UI for different risk strata?

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page