Step-by-Step External Validation and Recalibration in Stata (Step 2 in Debray Framework)
Step 0 – Load the external validation cohort
clear
cd "C:\WORK\Location\External validation"
use "validation.dta", clear
summarize
Why? We start with the validation dataset only. Step 2 is about how the original model behaves in a new population – no re-fitting from scratch, just evaluation and recalibration.
Step 1 – Reconstruct the original linear predictor and risk
* 1.1 Generate the linear predictor (log-odds) using original coefficients
gen logodds = -4.4415 ///
+ 0.0084*(age - 74.3651) ///
+ 0.6452*(bt - 37.1226) ///
+ 0.8591*((map/100)^-2 - 1.0545) ///
+ 2.15*ettyn ///
- 0.0473*(na - 138.6417) ///
+ 0.0199*(bun - 18.7486) ///
- 1.0573*(alb - 3.8174)
* 1.2 Convert log-odds to predicted probability
gen prob = invlogit(logodds)
* 1.3 Quick check of distribution
summarize logodds prob
Why?
- We must use the original model’s coefficients (from the development cohort) to get the same linear predictor (LP) in the validation cohort.
- logodds = the original model’s LP.
- prob = predicted risk for each patient.
- Step 2 in Debray is all about: given this original LP, how calibrated and discriminative is it in the new data?
Step 2 – Overall calibration (O:E and CITL)
2.1 Observed / Expected (O:E) ratio
* Expected number of events based on model
summ prob
local expected = r(sum)
* Observed number of events
summ death
local observed = r(sum)
display "Observed = " `observed'
display "Expected = " `expected'
display "O:E = " `observed' / `expected'
Why?
- Expected = sum of individual predicted risks = model’s expected number of deaths.
- Observed = count of actual deaths.
- O:E = 1 → perfect overall calibration (on average).
- O:E > 1 → more events than predicted → model underpredicts.
- O:E < 1 → fewer events than predicted → model overpredicts.
Debray calls this part of “calibration in the large” — is the overall incidence aligned?
2.2 Calibration-in-the-large (CITL) via offset
This is the TRIPOD / Debray / Steyerberg recommended way.
* Fit calibration model with LP as offset
logit death, offset(logodds)
* CITL is the intercept of this model
display "CITL (system) = " _b[_cons]
What this model is:
- logodds is fixed with coefficient 1 (offset).
- b[cons] = α is the CITL:
- α = 0 → no intercept shift (perfect average calibration)
- α < 0 → model overestimates risk
- α > 0 → model underestimates risk
Why not just use logit(mean(observed)) − logit(mean(predicted))? Because logit is nonlinear, the logistic regression intercept with an offset is the true maximum-likelihood CITL, accounting for the full distribution of predicted risks rather than just their mean.
You can still show the “manual” approximation for teaching:
* Manual approximation (for illustration; not for reporting)
summ death
local p_obs = r(mean)
summ prob
local p_exp = r(mean)
display "Manual CITL = " logit(`p_obs') - logit(`p_exp')
But for publications: use b[cons] from the offset model.
Step 3 – Visual calibration: calibration plot
pmcalplot prob death, ci legend(off) ///
title("Calibration of Original _____ Model") ///
name(cal_orig, replace)
Why?
- CITL and slope are global summaries; they can hide local miscalibration.
- Debray strongly recommends calibration plots: you group patients by predicted risk and plot observed vs predicted.
- The plot reveals:
- Underprediction at high risk?
- Overprediction at low risk?
- Any threshold regions where the model is clinically misleading?
This matches Debray’s Step 2 “visual assessment”.
Step 4 – Recalibration: intercept only (keep original slopes)
To update the intercept for the validation cohort, we need the LP without the original intercept.
4.1 Remove original intercept from LP
* Original intercept was -4.4415, so remove it
gen logodds_noint = logodds + 4.4415
Now:
(no intercept term).
4.2 Fit intercept-only recalibration (offset with no intercept inside LP)
* Recalibrate intercept only: slopes fixed, new intercept
logit death, offset(logodds_noint)
display "Recalibrated intercept (validation) = " _b[_cons]
Model:
Why remove the intercept?
- If you used logodds (which already contains −4.4415) as the offset, Stata would estimate an additional intercept γ and the true intercept would be −4.4415 + γ.
- By stripping it out first, b[cons] becomes the new intercept for the validation population. Clean and interpretable.
4.3 Get recalibrated probabilities (intercept-only)
* New linear predictor and probability after intercept-only update
predict lp_intonly, xb ///linear predictor
gen prob_intonly = invlogit(lp_intonly)
summarize lp_intonly prob_intonly death
* Calibration plot for intercept-only recalibrated model
pmcalplot prob_intonly death, ci legend(off) ///
title("Calibration: Intercept-only Recalibrated _____ Model) ///
name(cal_intonly, replace)
This step generates new probabilities from the recalibrated model you just adjusted.
Interpretation:
- Slopes = original model, so ranking (discrimination) is unchanged.
- Intercept adapted to the new baseline risk in the validation cohort.
Step 5 – Full recalibration: intercept + slope (calibration slope)
Now we check whether the strength of predictor effects is the same in the new data.
5.1 Fit calibration model with logodds as a predictor
* Full recalibration: estimate both CITL and calibration slope
logit death logodds
display "New intercept (CITL*) = " _b[_cons]
display "Calibration slope = " _b[logodds]
Model:
- b[cons] = new intercept α
- _b[logodds] = calibration slope β เราไม่ได้หา Coefficient ของโมเดลมาใหม่เราหา Coefficient aka slop ของ logodds ทั้งสมการเดิม และ intercept ใหม่ของ logodds ทั้งสมการเดิม
Interpretation of slope β:
- β = 1 → perfect agreement with original predictor effects
- β < 1 → predictions too extreme (overfitting in development)
- β > 1 → predictions not extreme enough; weaker associations in validation
This is exactly Debray’s “calibration slope” metric.
5.2 Build fully recalibrated predictions from slope + intercept
* Recalibrated linear predictor using new intercept and slope
gen lp_full = _b[_cons] + _b[logodds] * logodds
* Recalibrated probabilities
gen prob_full = invlogit(lp_full)
summarize lp_full prob_full death
* Calibration plot for fully recalibrated model
pmcalplot prob_full death, ci legend(off) ///
title("Calibration: Intercept + Slope Recalibrated _____ Model") ///
name(cal_full, replace)
Why?
- This creates a fully recalibrated prediction model:
- When β < 1, we shrink extreme predictions towards the mean; when β > 1, we stretch them. อันนี้พูดถึงการกระจายการ High Risk and Low risk
Step 6 – Discrimination: c-statistic / AUROC
Debray Step 2 also requires discrimination assessment (how well the model ranks risk).
6.1 ROC for original model
* Discrimination of original model
roctab death prob
6.2 ROC for recalibrated models (optional)
* Discrimination of fully recalibrated model
roctab death prob_full
* Or use lroc after the logit fit
logit death logodds // last model used for discrimination
lroc, ///
title("ROC Curve for _____ Model") ///
xtitle("1 - Specificity") ///
ytitle("Sensitivity") ///
name(roc______ Model, replace)
Why?
- The c-statistic/AUROC tells us if the model can still rank high-risk vs low-risk patients in the validation cohort.
- For a given model, recalibrating the intercept or slope usually has minimal effect on the c-statistic (it’s about ranking, not absolute probabilities).
- Debray stresses: discrimination must be interpreted together with LP SD (case-mix) from Step 1 — more heterogeneous populations naturally yield higher c-statistics.
Step 7 – How all this maps to Debray Step 2
Putting it together:
- Construct original LP & risk (logodds, prob)→ evaluate the original model as is in the validation cohort.
- O:E ratio→ quick sense of global calibration: does the model over- or under-predict total events?
- CITL via offset (logit death, offset(logodds))→ exact intercept shift needed to match average risk.
- Calibration plot (pmcalplot)→ visualize local miscalibration across the risk spectrum.
- Intercept-only recalibration→ adjust baseline risk for the new cohort, keep original associations.
- Full recalibration with slope (logit death logodds)→ check if associations are weaker/stronger in validation; update both intercept and slope.
- C-statistic (roctab / lroc)→ quantify discrimination; interpret alongside LP SD and case-mix from Step 1.
That’s exactly Debray Step 2: quantitative metrics (CITL, slope, c-statistic) plus visual calibration, with clear interpretation of what is off — baseline risk vs predictor effects vs case-mix.
- Q1: After intercept-only recalibration, what is the new model?
- Q2: After intercept + slope recalibration, what is the new model?
- Q3: Is “β_new = β_calib_slope × β_old” what people call model calibration?
1 Intercept-only recalibration → new model
From the development model:
e.g. α'dev = −4.4415, and the β’s are your age, BT, MAP, etc.
You then do:
gen logodds_noint = logodds + 4.4415 // remove old intercept
logit death, offset(logodds_noint)
display _b[_cons]
This fits:
👉 New model (intercept-only recalibrated):
- New intercept for this population:
- Slopes stay the same:
So the formula to use in this setting is:
In practice you already did the key step:
predict lp_intonly, xb
gen prob_intonly = invlogit(lp_intonly)
✅ For clinical use in this population → you can report same β’s, new intercept (α_val) as the recalibrated model.
2 Intercept + slope recalibration (calibration slope)
Now you do:
logit death logodds
display _b[_cons]
display _b[logodds]
Stata fits the calibration model:
Where:
- logodds_i = α_dev + Σ β_j X_ji (old LP, including old intercept)
- b[cons] = α_cal
- b[logodds] = βcal = calibration slope
You then create:
gen lp_full = _b[_cons] + _b[logodds]*logodds
gen prob_full = invlogit(lp_full)
This is the recalibrated logit:
You can use this as-is in practice:
- For a new patient:
- Compute original logodds using old coefficients.
- Apply: lp_new = α_cal + β_cal * logodds
- Risk = invlogit(lp_new).
That’s already a valid updated model.
If you want explicit new α and new β for each predictor
Sometimes you want the expanded form in terms of the original X’s:
So:
- New intercept
- New slopes
In Stata (conceptually):
* After: logit death logodds
scalar alpha_dev = -4.4415 // from original model
scalar alpha_cal = _b[_cons] // from calibration model
scalar slope_cal = _b[logodds] // calibration slope β_cal
scalar alpha_new = alpha_cal + slope_cal*alpha_dev
scalar beta_age_new = slope_cal*0.0084 // and so on for each β_j
✅ That gives you a new full set of coefficients (α_new's, β_new’s) that you can publish/put in a calculator if you want everything expressed in terms of the original predictors.
3 Is this what we call “model calibration”?
Yes, in the Debray / Steyerberg terminology:
- Intercept-only update→ often called “recalibration-in-the-large” or “updating the intercept”.
- Intercept + slope (what you called “β_new = β_calib × β_old”)→ called “logistic recalibration” or “weak model updating”, using the calibration slope.
So your summary is essentially correct:
- Intercept-only:use same β’s, replace old intercept with the new one from offset model → new model.
- Intercept + slope:use α_new = α_cal + β_cal*α_dev and β_new_j = β_cal*β_j_oldor equivalently, compute new LP as α_cal + β_cal*logodds.This is the calibrated (updated) model.
If you tell me your actual α_cal and β_cal, I can write out the exact new coefficients for Age, BT, MAP, etc., using your MAGENTA numbers.