top of page

Risk Score Calibration Plot vs Decile Calibration Plot (pmcalplot)

  • Writer: Mayta
    Mayta
  • 15 minutes ago
  • 6 min read

Risk Score Calibration Plot vs Decile Calibration Plot

When reading clinical prediction papers, many people recognize the word calibration but still feel confused when the graphs look very different. Some papers show the x-axis as a clinical score such as 0–5, while others show the x-axis as predicted probability and divide the data into 10 groups. Both are calibration plots, but they are not exactly the same type.

This article explains the difference between a risk score calibration plot and a decile calibration plot, why they are used, how they are constructed, and how to interpret them correctly.


What is calibration?

Calibration asks a simple question:

If a model predicts a certain risk, does that predicted risk match what actually happened?

This is different from discrimination.

  • Discrimination asks whether the model can separate people with disease from people without disease.

  • Calibration asks whether the predicted probabilities are numerically accurate.

A model can rank patients correctly and still be poorly calibrated. For example, it may correctly label some patients as higher risk than others, but predict 80% when the real observed risk is only 50%.


1. Risk Score Calibration Plot

Definition

A risk score calibration plot is a calibration graph used when a prediction model has been simplified into a point-based clinical score.

Examples of this type of model include point systems such as:

  • CURB-65

  • CHA₂DS₂-VASc

  • Wells score

  • Other bedside scoring tools converted from regression models

Instead of plotting raw predicted probabilities for each patient, the plot uses the total score categories as the x-axis.


Typical structure

X-axis

Total score, such as 0, 1, 2, 3, 4, 5

Y-axis

Observed proportion of the outcome, such as disease, death, or complication

Display

Usually the figure contains:

  • a line for predicted risk at each score

  • points or circles for the observed risk in the actual data


What it means

If the observed points sit close to the predicted line, the score is well calibrated.

For example:

  • score 0 → predicted risk 5%, observed risk 4%

  • score 1 → predicted risk 20%, observed risk 22%

  • score 2 → predicted risk 60%, observed risk 58%

That means the score performs well because the estimated probability attached to each score is close to reality.

Figure 1. Example of a risk score calibration plot showing predicted and observed event rates across total score categories.


Why this type is used

This plot is especially useful when the model has already been translated into a clinical bedside tool. Doctors often use integer scores more easily than raw model equations.

Instead of saying:

predicted probability = 0.73

the bedside tool says:

total score = 4

and then maps that score to a clinical risk.

So in this setting, calibration is naturally assessed by score category.


Strengths

  • very intuitive for clinicians

  • easy to read in bedside-score papers

  • directly matches how the tool is used in practice

  • each score is already a natural group


Limitations

  • only works well when the model is actually expressed as a point score

  • can hide variation within a score group

  • provides less detail than patient-level probability calibration

  • may be unstable if some score groups contain very few patients


2. Decile Calibration Plot

Definition

A decile calibration plot is a calibration graph that uses predicted probabilities from a model and groups patients into 10 equal-sized groups according to their predicted risk.

This is one of the most traditional forms of calibration display in logistic regression and prediction-model papers.


Why it is called “decile”

The word decile means one-tenth.

The dataset is sorted by predicted probability, then divided into 10 groups, each containing about 10% of the patients.

For each group, the researcher calculates:

  • the mean predicted probability

  • the observed event rate

These values are then plotted against each other.


Typical structure

X-axis

Predicted probability

Usually the average predicted risk within each decile

Y-axis

Observed probability

Usually the actual event rate within that decile

Display

Usually the figure contains:

  • a 45-degree reference line, representing perfect calibration

  • 10 points, one for each decile

  • sometimes a connecting line between the points


How it is built

The process is usually:

Step 1

Run a model and obtain the predicted probability for every patient.

Step 2

Sort all patients from lowest predicted risk to highest predicted risk.

Step 3

Divide them into 10 equally sized groups.

Step 4

For each group, calculate:

  • mean predicted probability

  • observed proportion of the outcome

Step 5

Plot observed versus predicted risk.

If the points follow the 45-degree line, calibration is good.

Figure 2. Example of a decile calibration plot comparing observed and predicted event rates across deciles of predicted risk. We don't use this image, it looks suck.


Figure 3. Also, an example of a decile calibration plot uses pmcalplot in Stata, comparing observed and predicted event rates across deciles of predicted risk.

Why use 10 groups?

Because raw patient-level predictions can be noisy. Grouping into deciles creates a cleaner summary.

Ten groups became common because it is a practical balance:

  • fewer groups may oversimplify

  • too many groups may become unstable and noisy

So deciles are a traditional compromise between readability and detail.


Strengths

  • widely recognized in prediction-model literature

  • easy to compare predicted versus observed risk

  • useful for models with continuous predicted probabilities

  • closely related to the logic of the Hosmer–Lemeshow goodness-of-fit test


Limitations

  • the appearance depends on how the grouping is done

  • two models with similar decile plots may behave differently at the patient level

  • grouping can hide miscalibration inside a decile

  • not as informative as a smooth calibration curve in modern modeling work


3. Key difference between the two

The main difference is the meaning of the x-axis.

Risk score calibration plot

The x-axis is clinical score categories.

Each value on the x-axis is a score group that already exists in the scoring system.

Decile calibration plot

The x-axis is predicted probability, grouped into 10 bins.

The groups do not come from a clinical score. They are created statistically after the model produces patient-level probabilities.


Simple comparison table


4. Why do the two plots look different

They look different because they answer calibration at two different levels.

Risk score plot

Focuses on the question:

For each score value, what was the actual observed risk?

This fits clinical tools where the final output is a score.

Decile plot

Focuses on the question:

Across increasing predicted risk groups, how close were predicted and observed probabilities?

This fits statistical models that produce probabilities directly.


5. How to interpret each one correctly

Interpreting a risk score calibration plot

Look at each score category and compare:

  • predicted risk

  • observed risk

If the observed points are close to the predicted curve, calibration is good.

Example

If score 3 is predicted to have 85% risk and the observed rate is 83%, that is good calibration.

If score 3 is predicted 85% but observed only 50%, that is poor calibration.


Interpreting a decile calibration plot

Look at the relationship between the decile points and the diagonal reference line.

  • points on the line = well calibrated

  • points above the line = model underestimates risk

  • points below the line = model overestimates risk

Example

If a decile has predicted risk 0.40 but observed risk 0.60, that means the model underestimated risk in that range.


6. Relationship to modern calibration curves

Today, many statisticians prefer smooth calibration curves rather than only decile plots.

A smooth calibration curve often uses flexible methods such as LOESS or spline-based fitting to show calibration across the full probability range.

Compared with decile plots:

  • decile plots are simpler and traditional

  • smooth curves are often more informative

  • smooth curves reduce the information loss caused by grouping

But decile plots are still commonly shown because they are easy to understand.


7. Common mistakes in interpretation

Mistake 1: confusing calibration with discrimination

A high AUC does not guarantee good calibration.

Mistake 2: assuming grouped plots show everything

Both score-group plots and decile plots can hide problems inside groups.

Mistake 3: overinterpreting sparse groups

If a score category or decile has few patients, the observed risk may be unstable.

Mistake 4: treating all calibration plots as the same

A risk score calibration plot and a decile calibration plot are related, but not identical. Their x-axes represent different things.


8. When should each plot be used?

Use a risk score calibration plot when:

  • the final tool is a bedside score

  • users make decisions based on integer score groups

  • the model has already been simplified into points

Use a decile calibration plot when:

  • the model gives continuous predicted probabilities

  • you want a traditional grouped assessment of calibration

  • you are presenting logistic regression or machine learning outputs


9. Practical takeaway

A risk score calibration plot is best thought of as:

Calibration assessed across score categories

A decile calibration plot is best thought of as:

Calibration assessed across tenth-based groups of predicted probability

They are both valid ways to examine whether predicted risk matches observed risk. The difference is not that one is right and the other is wrong. The difference is that they are designed for different model outputs.

  • If the model output is a score, use score-based calibration.

  • If the model output is a probability, use decile-based calibration or a smooth calibration curve.


Suggested short conclusion for your blog

Calibration is about whether predicted risk matches reality. A risk score calibration plot uses score categories on the x-axis and is ideal for point-based clinical tools. A decile calibration plot uses predicted probabilities divided into ten groups and is common in regression and prediction-model studies. Both aim to compare predicted and observed risk, but they do so at different levels of model representation.


One-line memory aid

Score plot = calibration by clinical points Decile plot = calibration by grouped predicted probabilities

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page