Linear Predictor (LP): Foundation of Debray’s Relatedness Assessment in External Validation

Mayta
7 days ago
2 min read

Updated: 6 days ago

Introduction

Where It Comes From and How Its Distribution Is Formed**

The Linear Predictor (LP) is a central component in clinical prediction modeling and a key element of the Debray Step 1 approach to assessing relatedness between development and validation populations.

This article shows:

Where LP values come from
How LP is computed
How LP becomes a distribution
Why Debray uses LP to evaluate relatedness
Four step-by-step images demonstrating LP creation

1 What Is the LP (Linear Predictor)?

The LP is the raw score produced by a regression model before it is converted to a probability.

For a logistic regression model:

Where:

β0 = intercept
βi = coefficient of predictor
Xi = value of the predictor for a specific patient

LP is the foundation of the model’s structure.

2 LP Comes Directly From the Model Equation

LP values do not come from a histogram. They do not come from probability. They do not come from bins.

They come directly from:

Model coefficients
Patient predictor values

💡 LP exists BEFORE probability. The predicted probability is:

But Debray’s method uses LP itself, because LP captures the model structure independent of prevalence.

3 Concrete Example (Simple Diabetes Model)

Imagine this model:

Example Patients:

Patient A Age = 60 Obese = 1

Patient B Age = 30 Obese = 0

These values (–2.1, –4.8…) are LP values.

4 How LP Becomes a Distribution (Debray Approach 2)

Once you compute LP for all patients, you:

Collect the LP values
Visualize them in a histogram
Convert the histogram into a density curve (LP distribution)
Compare LP mean & LP SD between development and validation datasets

Differences in:

LP Mean → baseline risk difference
LP SD → case-mix heterogeneity difference

These determine relatedness.

Step-by-Step Visual Explanation

Below are your four steps EXACTLY as they were generated.

🔵 Step 1 — LP Value for Each Patient

Model coefficients + patient predictor values → One LP per patient

You see one dot per patient.

🟠 Step 2 — First 20 LP Values (Example Raw Values)

These are computed directly from:

[\text{LP} = -6 + 0.04 \cdot \text{Age} + 1.5 \cdot \text{Obese}]

Example printed values (from your code):

[-3.72, -1.98, -4.72, -4.88, -3.34, -3.26, -3.5 , -3.1 , -5.2 ,
 -3.06, -3.66, -4.72, -4.92, -3.4 , -4.96, -4.2 , -3.2 , -2.9 ,
 -3.72, -4.48]

These are EXACT LP values.

🔶 Step 3 — Histogram of LP Values

Groups the LP values into bins (ranges).

This shows how many patients fall into each LP range.

🟨 Step 4 — Smooth LP Distribution Curve

This is the LP distribution used in Debray’s Approach 2.

This representation makes it easy to compare:

LP Mean (center)
LP SD (spread)

Between development & validation datasets.

These are the two core measures of relatedness in Debray’s framework.

Summary

LP values come from:

The model formula (coefficients)
The patient data (predictor values)

LP distribution comes from:

Collecting all LP values
Plotting their histogram
Converting into a density curve

Debray Step 1 uses LP because:

LP reflects the model’s linear structure
LP Mean shows baseline risk differences
LP SD shows case-mix heterogeneity
Comparing LP distributions reveals whether two populations are similar (reproducible) or different (transportable)