Linear Predictor (LP): Foundation of Debray’s Relatedness Assessment in External Validation

Introduction
Where It Comes From and How Its Distribution Is Formed**
The Linear Predictor (LP) is a central component in clinical prediction modeling and a key element of the Debray Step 1 approach to assessing relatedness between development and validation populations.
This article shows:
- Where LP values come from
- How LP is computed
- How LP becomes a distribution
- Why Debray uses LP to evaluate relatedness
- Four step-by-step images demonstrating LP creation
1 What Is the LP (Linear Predictor)?
The LP is the raw score produced by a regression model before it is converted to a probability.
For a logistic regression model:
Where:
- β0 = intercept
- βi = coefficient of predictor
- Xi = value of the predictor for a specific patient
LP is the foundation of the model’s structure.
2 LP Comes Directly From the Model Equation
LP values do not come from a histogram. They do not come from probability. They do not come from bins.
They come directly from:
- Model coefficients
- Patient predictor values
💡 LP exists BEFORE probability. The predicted probability is:
But Debray’s method uses LP itself, because LP captures the model structure independent of prevalence.
3 Concrete Example (Simple Diabetes Model)
Imagine this model:
Example Patients:
Patient A Age = 60 Obese = 1
Patient B Age = 30 Obese = 0
These values (–2.1, –4.8…) are LP values.
4 How LP Becomes a Distribution (Debray Approach 2)
Once you compute LP for all patients, you:
- Collect the LP values
- Visualize them in a histogram
- Convert the histogram into a density curve (LP distribution)
- Compare LP mean & LP SD between development and validation datasets
Differences in:
- LP Mean → baseline risk difference
- LP SD → case-mix heterogeneity difference
These determine relatedness.
Step-by-Step Visual Explanation
Below are your four steps EXACTLY as they were generated.
🔵 Step 1 — LP Value for Each Patient
Model coefficients + patient predictor values → One LP per patient
You see one dot per patient.

🟠 Step 2 — First 20 LP Values (Example Raw Values)
These are computed directly from:
[\text{LP} = -6 + 0.04 \cdot \text{Age} + 1.5 \cdot \text{Obese}]
Example printed values (from your code):
[-3.72, -1.98, -4.72, -4.88, -3.34, -3.26, -3.5 , -3.1 , -5.2 ,
-3.06, -3.66, -4.72, -4.92, -3.4 , -4.96, -4.2 , -3.2 , -2.9 ,
-3.72, -4.48]
These are EXACT LP values.
🔶 Step 3 — Histogram of LP Values
Groups the LP values into bins (ranges).
This shows how many patients fall into each LP range.

🟨 Step 4 — Smooth LP Distribution Curve
This is the LP distribution used in Debray’s Approach 2.

This representation makes it easy to compare:
- LP Mean (center)
- LP SD (spread)
Between development & validation datasets.
These are the two core measures of relatedness in Debray’s framework.
Summary
LP values come from:
- The model formula (coefficients)
- The patient data (predictor values)
LP distribution comes from:
- Collecting all LP values
- Plotting their histogram
- Converting into a density curve
Debray Step 1 uses LP because:
- LP reflects the model’s linear structure
- LP Mean shows baseline risk differences
- LP SD shows case-mix heterogeneity
- Comparing LP distributions reveals whether two populations are similar (reproducible) or different (transportable)