← All posts

Reporting Performance and Stability in TRIPOD+AI & Riley Framework Clinical Prediction Models: A Stata-Centered Code+Framework

Clinical Epidemiology ResearchUniqcret doctor knowledgesMethodology and Research DesignDiagnosis [Methodology]Prognosis [Methodology]Data Analytics or StatisticsStata [Data Analytics]R [Data Analytics]

This article integrates the TRIPOD+AI reporting standards with the latest Riley/Collins/Ensor stability framework. It shifts the focus from just "average" performance to "individual" reliability using the pm-suite in Stata.


Introduction

In modern clinical prediction, showing that a model is "accurate on average" is no longer enough. Under TRIPOD+AI, you must report both Performance (how well the model works for the population) and Stability (how much an individual’s risk estimate changes if the training data were slightly different).


1. The Core Distinction


2. The 8 Required Outputs

To be fully compliant with the Riley/Collins logic, your results section should include the following three pillars:

Pillar A: Performance (Population-Average)

  1. ROC Curve / C-statistic: Can the model separate cases from non-cases?
  2. Calibration Plot: Does the predicted risk match observed risk across the spectrum?
  3. Decision Curve Analysis (DCA): Does the model provide higher Net Benefit than "treat all" or "treat none" at clinical thresholds?

Pillar B: Stability (Individual & Decision Reliability)

  1. Prediction Instability Plot: Visualizes the "wiggle" of individual risks across bootstrap re-fits.
  2. Average MAPE (Stability Index): The mean absolute difference between original and bootstrap risks. Target: $< 0.02$ (context-dependent).
  3. 95% Uncertainty Interval (UI): The range (2.5th to 97.5th percentile) of risk for a single patient across re-fits.
  4. Classification Instability Plot: Shows "threshold flipping"—how often a patient moves from "low risk" to "high risk" across model developments.

Pillar C: Stability (Population-Level)

  1. Calibration Instability Plot: A "spaghetti plot" of calibration curves from bootstrap re-fits to show if the model's reliability is volatile.

3. The Stata Toolchain: pm-suite

The authoritative tools for this workflow are maintained by Joie Ensor and the Riley/Collins team.

Installation

Stata

* Performance & Utilities
ssc install pmcalplot, replace
net install dca, from("https://raw.github.com/ddsjoberg/dca.stata/master/") replace

* The Stability Suite (Riley/Ensor)
net from https://joieensor.github.io/pm-suite/
net install pmstabilityplots, replace
net install pmstabilityss, replace  // For sample size planning

Mapping Requirements to Commands

RequirementStata CommandKey Output
CalibrationpmcalplotObserved vs. Predicted
Clinical UtilitydcaNet Benefit
Individual StabilitypmstabilityplotsPrediction Instability Plot & MAPE
Decision StabilitypmstabilityplotsClassification Instability
Calib. StabilitypmstabilityplotsSpaghetti Calibration Curves

4. Implementation Workflow

Step 1: Fit and Assess Performance

Stata

logistic outcome x1 x2 x3
predict p_app, pr

* Standard Performance
lroc
pmcalplot p_app outcome, count
dca outcome p_app, xstop(0.5)

Step 2: Assess Stability (The Riley Method)

Using pmstabilityplots automates the bootstrap re-development process. It re-estimates the model parameters multiple times to see how much the individual predictions change.

Stata

* Stability Assessment (e.g., 200 bootstrap reps)
* 'threshold' defines the point for Classification Instability
pmstabilityplots outcome x1 x2 x3, reps(200) threshold(0.2)

This command generates the three critical figures:

  1. Prediction Instability Plot: Highlighting the MAPE and 95% UIs.
  2. Classification Instability: Visualizing how many patients cross the 20% risk threshold.
  3. Calibration Instability: Showing the variation in the calibration intercept and slope.


5. Reporting Template (Methods Section)

"Model performance was evaluated via discrimination (C-statistic), calibration (calibration plots), and clinical utility (Decision Curve Analysis). To ensure the reliability of individual-level predictions, we performed a stability analysis according to the Riley/Collins framework. We quantified prediction instability using Mean Absolute Prediction Error (MAPE) and individual 95% uncertainty intervals (UI). Decision stability was assessed via classification instability plots at a clinical threshold of [X%]. All stability analyses were performed in Stata using the pmstabilityplots package (pm-suite), involving [200] bootstrap re-development cycles."


6. R Crosswalk: pminternal

If collaborating with R users, the pminternal package provides the exact same framework:

ItemR Function (pminternal)
Prediction Instabilityprediction_stability()
Stability Index (MAPE)mape_stability()
Decision Stabilitydcurve_stability()
Calibration Stabilitycalibration_stability()

Would you like me to generate a mock Results table showing how to present the MAPE and 95% UI for different risk strata?

Comments

No comments yet. Be the first to share your thoughts.

Sign in to comment

Reporting Performance and Stability in TRIPOD+AI & Riley Framework Clinical Prediction Models: A Stata-Centered Code+Framework — Uniqcret