Calibration Plot in Clinical Prediction Models [Calibration-in-the-Large (CITL), Calibration Slope]
- Mayta
- 2 days ago
- 3 min read
Abstract
Calibration is a fundamental property of clinical prediction models (CPMs), reflecting how well predicted probabilities agree with actual observed outcomes. Unlike discrimination—how well a model distinguishes between individuals with and without an event—calibration evaluates absolute accuracy. Poor calibration can mislead clinical decision-making even when discrimination appears acceptable. This article explains the conceptual foundation, metrics, and practical interpretation of calibration, including calibration-in-the-large, calibration slope, calibration plots, and recalibration strategies.
1. Introduction
Clinical prediction models are increasingly used to estimate individual risks of outcomes such as mortality, sepsis, stroke, or readmission. For a CPM to be clinically trustworthy, two performance domains must be demonstrated:
Discrimination – Can the model distinguish high-risk from low-risk patients?
Calibration – Are the predicted probabilities numerically correct?
Discrimination often receives more attention, but calibration is equally essential because even a highly discriminative model can give incorrect absolute risks that misguide treatment thresholds, triage, and counselling.
2. What is Calibration?
Calibration refers to the agreement between predicted probabilities and observed outcome rates.
If a model predicts 10% mortality for a group of patients, then roughly 10% should die.
Perfect calibration means:
Calibration answers the question: “Are the numbers correct?”, not just “Is ranking correct?”
3. Two Fundamental Calibration Metrics
3.1 Calibration-in-the-Large (CITL)
CITL assesses whether predictions are systematically too high or too low on average.
This is estimated as the intercept α of a logistic regression:
Ideal: CITL = 0
CITL < 0 → model overpredicts risk
CITL > 0 → model underpredicts risk
CITL reflects overall bias. It does not describe how predictions behave across the entire risk spectrum—that is the role of the slope.
3.2 Calibration Slope
The calibration slope β quantifies whether predictions are too extreme or too moderate.
Estimated from:
Slope = 1 → perfect spread
Slope < 1 → predictions too extreme (model overfitting)
Slope > 1 → predictions too modest (model underfitting)
Key insight:
CITL = mean shift
Slope = spread distortion
These two metrics diagnose different forms of miscalibration.
4. Visual Assessment: The Calibration Plot
Calibration plots display observed vs predicted risks, commonly across deciles of predicted probability. The ideal line is a 45° diagonal.
A typical plot shows:
Vertical shift → CITL problem
Flattened or steep curve → slope problem
Nonlinear deviations → model misspecification (e.g., missing interactions, wrong functional form)
Modern tools also use smooth loess curves to visualize continuous agreement.
5. Why Does Calibration Fail?
Overfitting in model development
Case-mix differences between development and validation settings
Changes in disease prevalence or treatment patterns
Incorrect functional form (e.g., missing nonlinear terms)
Measurement differences (e.g., lab assay changes, coding differences)
Small sample size
Thus, calibration must be checked not only during development but whenever a model is applied to a new population.
6. Recalibration Approaches
If a model is miscalibrated but discriminatory ability is preserved, recalibration can correct it.
6.1 Intercept-Only Recalibration
6.2 Intercept + Slope Recalibration
6.3 Full Model Revision
Rebuild or update the model (e.g., by adding predictors, nonlinearities, or interactions) when calibration problems reflect structural issues.
7. Calibration vs Discrimination: Complementary but Independent
A model can:
Have excellent discrimination but poor calibration
Be well calibrated yet poorly discriminative
Thus, calibration always needs explicit evaluation. Clinical risk thresholds (e.g., >15% for treatment) depend entirely on calibration, not discrimination.
8. Practical Stata Implementation
Calculate CITL
predict xb, xb
logit outcome, offset(xb)
Calculate Calibration Slope
logit outcome xb
Plot Calibration (pmcalplot command)
ssc install pmcalplot
pmcalplot predicted outcome
9. Conclusion
Calibration is essential for the clinical reliability of prediction models. It ensures that risk estimates are numerically accurate and clinically actionable. Whereas discrimination ranks patients by risk, calibration confirms whether the actual predicted risks are trustworthy.
A well-functioning CPM must demonstrate:
CITL ≈ 0 (average accuracy)
Slope ≈ 1 (correct spread)
Strong calibration plot alignment
Without calibration, a prediction model—even one with strong discrimination—may be unsafe for clinical use.




