Linear Predictor (LP): Foundation of Debray’s Relatedness Assessment in External Validation
- Mayta

- 7 days ago
- 2 min read
Updated: 6 days ago
Introduction
Where It Comes From and How Its Distribution Is Formed**
The Linear Predictor (LP) is a central component in clinical prediction modeling and a key element of the Debray Step 1 approach to assessing relatedness between development and validation populations.
This article shows:
Where LP values come from
How LP is computed
How LP becomes a distribution
Why Debray uses LP to evaluate relatedness
Four step-by-step images demonstrating LP creation
1 What Is the LP (Linear Predictor)?
The LP is the raw score produced by a regression model before it is converted to a probability.
For a logistic regression model:
Where:
β0 = intercept
βi = coefficient of predictor
Xi = value of the predictor for a specific patient
LP is the foundation of the model’s structure.
2 LP Comes Directly From the Model Equation
LP values do not come from a histogram. They do not come from probability. They do not come from bins.
They come directly from:
Model coefficients
Patient predictor values
💡 LP exists BEFORE probability. The predicted probability is:
But Debray’s method uses LP itself, because LP captures the model structure independent of prevalence.
3 Concrete Example (Simple Diabetes Model)
Imagine this model:
Example Patients:
Patient A Age = 60 Obese = 1
Patient B Age = 30 Obese = 0
These values (–2.1, –4.8…) are LP values.
4 How LP Becomes a Distribution (Debray Approach 2)
Once you compute LP for all patients, you:
Collect the LP values
Visualize them in a histogram
Convert the histogram into a density curve (LP distribution)
Compare LP mean & LP SD between development and validation datasets
Differences in:
LP Mean → baseline risk difference
LP SD → case-mix heterogeneity difference
These determine relatedness.
Step-by-Step Visual Explanation
Below are your four steps EXACTLY as they were generated.
🔵 Step 1 — LP Value for Each Patient
Model coefficients + patient predictor values → One LP per patient
You see one dot per patient.

🟠 Step 2 — First 20 LP Values (Example Raw Values)
These are computed directly from:
[\text{LP} = -6 + 0.04 \cdot \text{Age} + 1.5 \cdot \text{Obese}]
Example printed values (from your code):
[-3.72, -1.98, -4.72, -4.88, -3.34, -3.26, -3.5 , -3.1 , -5.2 ,
-3.06, -3.66, -4.72, -4.92, -3.4 , -4.96, -4.2 , -3.2 , -2.9 ,
-3.72, -4.48]
These are EXACT LP values.
🔶 Step 3 — Histogram of LP Values
Groups the LP values into bins (ranges).
This shows how many patients fall into each LP range.

🟨 Step 4 — Smooth LP Distribution Curve
This is the LP distribution used in Debray’s Approach 2.

This representation makes it easy to compare:
LP Mean (center)
LP SD (spread)
Between development & validation datasets.
These are the two core measures of relatedness in Debray’s framework.
Summary
LP values come from:
The model formula (coefficients)
The patient data (predictor values)
LP distribution comes from:
Collecting all LP values
Plotting their histogram
Converting into a density curve
Debray Step 1 uses LP because:
LP reflects the model’s linear structure
LP Mean shows baseline risk differences
LP SD shows case-mix heterogeneity
Comparing LP distributions reveals whether two populations are similar (reproducible) or different (transportable)






Comments