Step-by-Step External Validation and Recalibration in Stata (Step 2 in Debray Framework)
- Mayta

- 2 days ago
- 7 min read
Updated: 1 day ago
Step 0 – Load the external validation cohort
clear
cd "C:\WORK\Location\External validation"
use "validation.dta", clear
summarize
Why? We start with the validation dataset only. Step 2 is about how the original model behaves in a new population – no re-fitting from scratch, just evaluation and recalibration.
Step 1 – Reconstruct the original linear predictor and risk
* 1.1 Generate the linear predictor (log-odds) using original coefficients
gen logodds = -4.4415 ///
+ 0.0084*(age - 74.3651) ///
+ 0.6452*(bt - 37.1226) ///
+ 0.8591*((map/100)^-2 - 1.0545) ///
+ 2.15*ettyn ///
- 0.0473*(na - 138.6417) ///
+ 0.0199*(bun - 18.7486) ///
- 1.0573*(alb - 3.8174)
* 1.2 Convert log-odds to predicted probability
gen prob = invlogit(logodds)
* 1.3 Quick check of distribution
summarize logodds prob
Why?
We must use the original model’s coefficients (from the development cohort) to get the same linear predictor (LP) in the validation cohort.
logodds = the original model’s LP.
prob = predicted risk for each patient.
Step 2 in Debray is all about: given this original LP, how calibrated and discriminative is it in the new data?
Step 2 – Overall calibration (O:E and CITL)
2.1 Observed / Expected (O:E) ratio
* Expected number of events based on model
summ prob
local expected = r(sum)
* Observed number of events
summ death
local observed = r(sum)
display "Observed = " `observed'
display "Expected = " `expected'
display "O:E = " `observed' / `expected'
Why?
Expected = sum of individual predicted risks = model’s expected number of deaths.
Observed = count of actual deaths.
O:E = 1 → perfect overall calibration (on average).
O:E > 1 → more events than predicted → model underpredicts.
O:E < 1 → fewer events than predicted → model overpredicts.
Debray calls this part of “calibration in the large” — is the overall incidence aligned?
2.2 Calibration-in-the-large (CITL) via offset
This is the TRIPOD / Debray / Steyerberg recommended way.
* Fit calibration model with LP as offset
logit death, offset(logodds)
* CITL is the intercept of this model
display "CITL (system) = " _b[_cons]
What this model is:
logodds is fixed with coefficient 1 (offset).
b[cons] = α is the CITL:
α = 0 → no intercept shift (perfect average calibration)
α < 0 → model overestimates risk
α > 0 → model underestimates risk
Why not just use logit(mean(observed)) − logit(mean(predicted))? Because logit is nonlinear, the logistic regression intercept with an offset is the true maximum-likelihood CITL, accounting for the full distribution of predicted risks rather than just their mean.
You can still show the “manual” approximation for teaching:
* Manual approximation (for illustration; not for reporting)
summ death
local p_obs = r(mean)
summ prob
local p_exp = r(mean)
display "Manual CITL = " logit(`p_obs') - logit(`p_exp')
But for publications: use b[cons] from the offset model.
Step 3 – Visual calibration: calibration plot
pmcalplot prob death, ci legend(off) ///
title("Calibration of Original _____ Model") ///
name(cal_orig, replace)
Why?
CITL and slope are global summaries; they can hide local miscalibration.
Debray strongly recommends calibration plots: you group patients by predicted risk and plot observed vs predicted.
The plot reveals:
Underprediction at high risk?
Overprediction at low risk?
Any threshold regions where the model is clinically misleading?
This matches Debray’s Step 2 “visual assessment”.
Step 4 – Recalibration: intercept only (keep original slopes)
To update the intercept for the validation cohort, we need the LP without the original intercept.
4.1 Remove original intercept from LP
* Original intercept was -4.4415, so remove it
gen logodds_noint = logodds + 4.4415
Now:
(no intercept term).
4.2 Fit intercept-only recalibration (offset with no intercept inside LP)
* Recalibrate intercept only: slopes fixed, new intercept
logit death, offset(logodds_noint)
display "Recalibrated intercept (validation) = " _b[_cons]
Model:
Why remove the intercept?
If you used logodds (which already contains −4.4415) as the offset, Stata would estimate an additional intercept γ and the true intercept would be −4.4415 + γ.
By stripping it out first, b[cons] becomes the new intercept for the validation population. Clean and interpretable.
4.3 Get recalibrated probabilities (intercept-only)
* New linear predictor and probability after intercept-only update
predict lp_intonly, xb ///linear predictor
gen prob_intonly = invlogit(lp_intonly)
summarize lp_intonly prob_intonly death
* Calibration plot for intercept-only recalibrated model
pmcalplot prob_intonly death, ci legend(off) ///
title("Calibration: Intercept-only Recalibrated _____ Model) ///
name(cal_intonly, replace)
This step generates new probabilities from the recalibrated model you just adjusted.
Interpretation:
Slopes = original model, so ranking (discrimination) is unchanged.
Intercept adapted to the new baseline risk in the validation cohort.
Step 5 – Full recalibration: intercept + slope (calibration slope)
Now we check whether the strength of predictor effects is the same in the new data.
5.1 Fit calibration model with logodds as a predictor
* Full recalibration: estimate both CITL and calibration slope
logit death logodds
display "New intercept (CITL*) = " _b[_cons]
display "Calibration slope = " _b[logodds]
Model:
b[cons] = new intercept
_b[logodds] = calibration slope β เราไม่ได้หา Coefficient ของโมเดลมาใหม่เราหา Coefficient aka slop ของ logodds ทั้งสมการเดิม และ intercept ใหม่ของ logodds ทั้งสมการเดิม
Interpretation of slope β:
β = 1 → perfect agreement with original predictor effects
β < 1 → predictions too extreme (overfitting in development)
β > 1 → predictions not extreme enough; weaker associations in validation
This is exactly Debray’s “calibration slope” metric.
5.2 Build fully recalibrated predictions from slope + intercept
* Recalibrated linear predictor using new intercept and slope
gen lp_full = _b[_cons] + _b[logodds] * logodds
* Recalibrated probabilities
gen prob_full = invlogit(lp_full)
summarize lp_full prob_full death
* Calibration plot for fully recalibrated model
pmcalplot prob_full death, ci legend(off) ///
title("Calibration: Intercept + Slope Recalibrated _____ Model") ///
name(cal_full, replace)
Why?
This creates a fully recalibrated prediction model:
When β < 1, we shrink extreme predictions towards the mean; when β > 1, we stretch them. อันนี้พูดถึงการกระจายการ High Risk and Low risk
Step 6 – Discrimination: c-statistic / AUROC
Debray Step 2 also requires discrimination assessment (how well the model ranks risk).
6.1 ROC for original model
* Discrimination of original model
roctab death prob
6.2 ROC for recalibrated models (optional)
* Discrimination of fully recalibrated model
roctab death prob_full
* Or use lroc after the logit fit
logit death logodds // last model used for discrimination
lroc, ///
title("ROC Curve for _____ Model") ///
xtitle("1 - Specificity") ///
ytitle("Sensitivity") ///
name(roc______ Model, replace)
Why?
The c-statistic/AUROC tells us if the model can still rank high-risk vs low-risk patients in the validation cohort.
For a given model, recalibrating the intercept or slope usually has minimal effect on the c-statistic (it’s about ranking, not absolute probabilities).
Debray stresses: discrimination must be interpreted together with LP SD (case-mix) from Step 1 — more heterogeneous populations naturally yield higher c-statistics.
Step 7 – How all this maps to Debray Step 2
Putting it together:
Construct original LP & risk (logodds, prob)→ evaluate the original model as is in the validation cohort.
O:E ratio→ quick sense of global calibration: does the model over- or under-predict total events?
CITL via offset (logit death, offset(logodds))→ exact intercept shift needed to match average risk.
Calibration plot (pmcalplot)→ visualize local miscalibration across the risk spectrum.
Intercept-only recalibration→ adjust baseline risk for the new cohort, keep original associations.
Full recalibration with slope (logit death logodds)→ check if associations are weaker/stronger in validation; update both intercept and slope.
C-statistic (roctab / lroc)→ quantify discrimination; interpret alongside LP SD and case-mix from Step 1.
That’s exactly Debray Step 2: quantitative metrics (CITL, slope, c-statistic) plus visual calibration, with clear interpretation of what is off — baseline risk vs predictor effects vs case-mix.
Q1: After intercept-only recalibration, what is the new model?
Q2: After intercept + slope recalibration, what is the new model?
Q3: Is “β_new = β_calib_slope × β_old” what people call model calibration?
1 Intercept-only recalibration → new model
From the development model:
e.g. α'dev = −4.4415, and the β’s are your age, BT, MAP, etc.
You then do:
gen logodds_noint = logodds + 4.4415 // remove old intercept
logit death, offset(logodds_noint)
display _b[_cons]
This fits:
👉 New model (intercept-only recalibrated):
New intercept for this population:
Slopes stay the same:
So the formula to use in this setting is:
In practice you already did the key step:
predict lp_intonly, xb
gen prob_intonly = invlogit(lp_intonly)
✅ For clinical use in this population → you can report same β’s, new intercept (α_val) as the recalibrated model.
2 Intercept + slope recalibration (calibration slope)
Now you do:
logit death logodds
display _b[_cons]
display _b[logodds]
Stata fits the calibration model:
Where:
logodds_i = α_dev + Σ β_j X_ji (old LP, including old intercept)
b[cons] = α_cal
b[logodds] = βcal = calibration slope
You then create:
gen lp_full = _b[_cons] + _b[logodds]*logodds
gen prob_full = invlogit(lp_full)
This is the recalibrated logit:
You can use this as-is in practice:
For a new patient:
Compute original logodds using old coefficients.
Apply: lp_new = α_cal + β_cal * logodds
Risk = invlogit(lp_new).
That’s already a valid updated model.
If you want explicit new α and new β for each predictor
Sometimes you want the expanded form in terms of the original X’s:
So:
New intercept
New slopes
In Stata (conceptually):
* After: logit death logodds
scalar alpha_dev = -4.4415 // from original model
scalar alpha_cal = _b[_cons] // from calibration model
scalar slope_cal = _b[logodds] // calibration slope β_cal
scalar alpha_new = alpha_cal + slope_cal*alpha_dev
scalar beta_age_new = slope_cal*0.0084 // and so on for each β_j
✅ That gives you a new full set of coefficients (α_new's, β_new’s) that you can publish/put in a calculator if you want everything expressed in terms of the original predictors.
3 Is this what we call “model calibration”?
Yes, in the Debray / Steyerberg terminology:
Intercept-only update→ often called “recalibration-in-the-large” or “updating the intercept”.
Intercept + slope (what you called “β_new = β_calib × β_old”)→ called “logistic recalibration” or “weak model updating”, using the calibration slope.
So your summary is essentially correct:
Intercept-only:
use same β’s, replace old intercept with the new one from offset model → new model.
Intercept + slope:
use α_new = α_cal + β_cal*α_dev and β_new_j = β_cal*β_j_oldor equivalently, compute new LP as α_cal + β_cal*logodds.This is the calibrated (updated) model.
If you tell me your actual α_cal and β_cal, I can write out the exact new coefficients for Age, BT, MAP, etc., using your MAGENTA numbers.






Comments