top of page

Step-by-Step External Validation and Recalibration in Stata (Step 2 in Debray Framework)

  • Writer: Mayta
    Mayta
  • 2 days ago
  • 7 min read

Updated: 1 day ago

Step 0 – Load the external validation cohort

clear
cd "C:\WORK\Location\External validation"

use "validation.dta", clear
summarize

Why? We start with the validation dataset only. Step 2 is about how the original model behaves in a new population – no re-fitting from scratch, just evaluation and recalibration.

Step 1 – Reconstruct the original linear predictor and risk

* 1.1 Generate the linear predictor (log-odds) using original coefficients
gen logodds = -4.4415 ///
    + 0.0084*(age - 74.3651) ///
    + 0.6452*(bt  - 37.1226) ///
    + 0.8591*((map/100)^-2 - 1.0545) ///
    + 2.15*ettyn ///
    - 0.0473*(na  - 138.6417) ///
    + 0.0199*(bun - 18.7486) ///
    - 1.0573*(alb - 3.8174)

* 1.2 Convert log-odds to predicted probability
gen prob = invlogit(logodds)

* 1.3 Quick check of distribution
summarize logodds prob

Why?

  • We must use the original model’s coefficients (from the development cohort) to get the same linear predictor (LP) in the validation cohort.

  • logodds = the original model’s LP.

  • prob = predicted risk for each patient.

  • Step 2 in Debray is all about: given this original LP, how calibrated and discriminative is it in the new data?


Step 2 – Overall calibration (O:E and CITL)

2.1 Observed / Expected (O:E) ratio

* Expected number of events based on model
summ prob
local expected = r(sum)

* Observed number of events
summ death
local observed = r(sum)

display "Observed = "  `observed'
display "Expected = " `expected'
display "O:E = " `observed' / `expected'

Why?

  • Expected = sum of individual predicted risks = model’s expected number of deaths.

  • Observed = count of actual deaths.

  • O:E = 1 → perfect overall calibration (on average).

  • O:E > 1 → more events than predicted → model underpredicts.

  • O:E < 1 → fewer events than predicted → model overpredicts.

Debray calls this part of “calibration in the large” — is the overall incidence aligned?

2.2 Calibration-in-the-large (CITL) via offset

This is the TRIPOD / Debray / Steyerberg recommended way.

* Fit calibration model with LP as offset
logit death, offset(logodds)

* CITL is the intercept of this model
display "CITL (system) = " _b[_cons]

What this model is:

  • logodds is fixed with coefficient 1 (offset).

  • b[cons] = α is the CITL:

    • α = 0 → no intercept shift (perfect average calibration)

    • α < 0 → model overestimates risk

    • α > 0 → model underestimates risk

Why not just use logit(mean(observed)) − logit(mean(predicted))? Because logit is nonlinear, the logistic regression intercept with an offset is the true maximum-likelihood CITL, accounting for the full distribution of predicted risks rather than just their mean.

You can still show the “manual” approximation for teaching:

* Manual approximation (for illustration; not for reporting)
summ death
local p_obs = r(mean)

summ prob
local p_exp = r(mean)

display "Manual CITL = " logit(`p_obs') - logit(`p_exp')

But for publications: use b[cons] from the offset model.

Step 3 – Visual calibration: calibration plot

pmcalplot prob death, ci legend(off) ///
    title("Calibration of Original _____ Model") ///
    name(cal_orig, replace)

Why?

  • CITL and slope are global summaries; they can hide local miscalibration.

  • Debray strongly recommends calibration plots: you group patients by predicted risk and plot observed vs predicted.

  • The plot reveals:

    • Underprediction at high risk?

    • Overprediction at low risk?

    • Any threshold regions where the model is clinically misleading?

This matches Debray’s Step 2 “visual assessment”.

Step 4 – Recalibration: intercept only (keep original slopes)

To update the intercept for the validation cohort, we need the LP without the original intercept.

4.1 Remove original intercept from LP

* Original intercept was -4.4415, so remove it
gen logodds_noint = logodds + 4.4415

Now:

(no intercept term).

4.2 Fit intercept-only recalibration (offset with no intercept inside LP)

* Recalibrate intercept only: slopes fixed, new intercept
logit death, offset(logodds_noint)

display "Recalibrated intercept (validation) = " _b[_cons]

Model:

Why remove the intercept?

  • If you used logodds (which already contains −4.4415) as the offset, Stata would estimate an additional intercept γ and the true intercept would be −4.4415 + γ.

  • By stripping it out first, b[cons] becomes the new intercept for the validation population. Clean and interpretable.

4.3 Get recalibrated probabilities (intercept-only)

* New linear predictor and probability after intercept-only update
predict lp_intonly, xb ///linear predictor
gen prob_intonly = invlogit(lp_intonly)

summarize lp_intonly prob_intonly death

* Calibration plot for intercept-only recalibrated model
pmcalplot prob_intonly death, ci legend(off) ///
    title("Calibration: Intercept-only Recalibrated _____ Model) ///
    name(cal_intonly, replace)

This step generates new probabilities from the recalibrated model you just adjusted.

Interpretation:

  • Slopes = original model, so ranking (discrimination) is unchanged.

  • Intercept adapted to the new baseline risk in the validation cohort.

Step 5 – Full recalibration: intercept + slope (calibration slope)

Now we check whether the strength of predictor effects is the same in the new data.

5.1 Fit calibration model with logodds as a predictor

* Full recalibration: estimate both CITL and calibration slope
logit death logodds

display "New intercept (CITL*) = " _b[_cons]
display "Calibration slope = " _b[logodds]

Model:

  • b[cons] = new intercept

  • _b[logodds] = calibration slope β เราไม่ได้หา Coefficient ของโมเดลมาใหม่เราหา Coefficient aka slop ของ logodds ทั้งสมการเดิม และ intercept ใหม่ของ logodds ทั้งสมการเดิม

Interpretation of slope β:

  • β = 1 → perfect agreement with original predictor effects

  • β < 1 → predictions too extreme (overfitting in development)

  • β > 1 → predictions not extreme enough; weaker associations in validation

This is exactly Debray’s “calibration slope” metric.

5.2 Build fully recalibrated predictions from slope + intercept

* Recalibrated linear predictor using new intercept and slope
gen lp_full = _b[_cons] + _b[logodds] * logodds

* Recalibrated probabilities
gen prob_full = invlogit(lp_full)

summarize lp_full prob_full death

* Calibration plot for fully recalibrated model
pmcalplot prob_full death, ci legend(off) ///
    title("Calibration: Intercept + Slope Recalibrated _____ Model") ///
    name(cal_full, replace)

Why?

  • This creates a fully recalibrated prediction model:

  • When β < 1, we shrink extreme predictions towards the mean; when β > 1, we stretch them. อันนี้พูดถึงการกระจายการ High Risk and Low risk

Step 6 – Discrimination: c-statistic / AUROC

Debray Step 2 also requires discrimination assessment (how well the model ranks risk).

6.1 ROC for original model

* Discrimination of original model
roctab death prob

6.2 ROC for recalibrated models (optional)

* Discrimination of fully recalibrated model
roctab death prob_full

* Or use lroc after the logit fit
logit death logodds   // last model used for discrimination
lroc, ///
    title("ROC Curve for _____ Model") ///
    xtitle("1 - Specificity") ///
    ytitle("Sensitivity") ///
    name(roc______ Model, replace)

Why?

  • The c-statistic/AUROC tells us if the model can still rank high-risk vs low-risk patients in the validation cohort.

  • For a given model, recalibrating the intercept or slope usually has minimal effect on the c-statistic (it’s about ranking, not absolute probabilities).

  • Debray stresses: discrimination must be interpreted together with LP SD (case-mix) from Step 1 — more heterogeneous populations naturally yield higher c-statistics.


Step 7 – How all this maps to Debray Step 2

Putting it together:

  1. Construct original LP & risk (logodds, prob)→ evaluate the original model as is in the validation cohort.

  2. O:E ratio→ quick sense of global calibration: does the model over- or under-predict total events?

  3. CITL via offset (logit death, offset(logodds))→ exact intercept shift needed to match average risk.

  4. Calibration plot (pmcalplot)→ visualize local miscalibration across the risk spectrum.

  5. Intercept-only recalibration→ adjust baseline risk for the new cohort, keep original associations.

  6. Full recalibration with slope (logit death logodds)→ check if associations are weaker/stronger in validation; update both intercept and slope.

  7. C-statistic (roctab / lroc)→ quantify discrimination; interpret alongside LP SD and case-mix from Step 1.

That’s exactly Debray Step 2: quantitative metrics (CITL, slope, c-statistic) plus visual calibration, with clear interpretation of what is off — baseline risk vs predictor effects vs case-mix.

  • Q1: After intercept-only recalibration, what is the new model?

  • Q2: After intercept + slope recalibration, what is the new model?

  • Q3: Is “β_new = β_calib_slope × β_old” what people call model calibration?

1 Intercept-only recalibration → new model

From the development model:


e.g. α'dev ​= −4.4415, and the β’s are your age, BT, MAP, etc.

You then do:

gen logodds_noint = logodds + 4.4415   // remove old intercept
logit death, offset(logodds_noint)
display _b[_cons]

This fits:

👉 New model (intercept-only recalibrated):

  • New intercept for this population: 

  • Slopes stay the same: 

So the formula to use in this setting is:

In practice you already did the key step:

predict lp_intonly, xb
gen prob_intonly = invlogit(lp_intonly)

✅ For clinical use in this population → you can report same β’s, new intercept (α_val) as the recalibrated model.

2 Intercept + slope recalibration (calibration slope)

Now you do:

logit death logodds
display _b[_cons]
display _b[logodds]

Stata fits the calibration model:

Where:

  • logodds_i = α_dev + Σ β_j X_ji (old LP, including old intercept)

  • b[cons] = α_cal

  • b[logodds] = βcal = calibration slope

You then create:

gen lp_full  = _b[_cons] + _b[logodds]*logodds
gen prob_full = invlogit(lp_full)

This is the recalibrated logit:

You can use this as-is in practice:

  • For a new patient:

    1. Compute original logodds using old coefficients.

    2. Apply: lp_new = α_cal + β_cal * logodds

    3. Risk = invlogit(lp_new).

That’s already a valid updated model.

If you want explicit new α and new β for each predictor

Sometimes you want the expanded form in terms of the original X’s:

So:

  • New intercept

  • New slopes

In Stata (conceptually):

* After: logit death logodds
scalar alpha_dev  = -4.4415               // from original model
scalar alpha_cal  = _b[_cons]             // from calibration model
scalar slope_cal  = _b[logodds]           // calibration slope β_cal

scalar alpha_new  = alpha_cal + slope_cal*alpha_dev
scalar beta_age_new = slope_cal*0.0084    // and so on for each β_j

✅ That gives you a new full set of coefficients (α_new's, β_new’s) that you can publish/put in a calculator if you want everything expressed in terms of the original predictors.

3 Is this what we call “model calibration”?

Yes, in the Debray / Steyerberg terminology:

  • Intercept-only update→ often called “recalibration-in-the-large” or “updating the intercept”.

  • Intercept + slope (what you called “β_new = β_calib × β_old”)→ called “logistic recalibration” or “weak model updating”, using the calibration slope.

So your summary is essentially correct:

  • Intercept-only:

    use same β’s, replace old intercept with the new one from offset model → new model.

  • Intercept + slope:

    use α_new = α_cal + β_cal*α_dev and β_new_j = β_cal*β_j_oldor equivalently, compute new LP as α_cal + β_cal*logodds.This is the calibrated (updated) model.

If you tell me your actual α_cal and β_cal, I can write out the exact new coefficients for Age, BT, MAP, etc., using your MAGENTA numbers.

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page