← All posts

Linear Predictor (LP): Foundation of Debray’s Relatedness Assessment in External Validation

Clinical Epidemiology ResearchUniqcret doctor knowledgesMethodology and Research DesignDiagnosis [Methodology]Prognosis [Methodology]
Linear Predictor (LP): Foundation of Debray’s Relatedness Assessment in External Validation

Introduction

Where It Comes From and How Its Distribution Is Formed**

The Linear Predictor (LP) is a central component in clinical prediction modeling and a key element of the Debray Step 1 approach to assessing relatedness between development and validation populations.

This article shows:

  1. Where LP values come from
  2. How LP is computed
  3. How LP becomes a distribution
  4. Why Debray uses LP to evaluate relatedness
  5. Four step-by-step images demonstrating LP creation

1 What Is the LP (Linear Predictor)?

The LP is the raw score produced by a regression model before it is converted to a probability.

For a logistic regression model:

LP = β0 + β1X1 + β2X2 + + βpXp

Where:

LP is the foundation of the model’s structure.


2 LP Comes Directly From the Model Equation

LP values do not come from a histogram. They do not come from probability. They do not come from bins.

They come directly from:

💡 LP exists BEFORE probability. The predicted probability is:

p = 1 1 + e - LP

But Debray’s method uses LP itself, because LP captures the model structure independent of prevalence.


3 Concrete Example (Simple Diabetes Model)

Imagine this model:

LP = -6.0 + 0.04(Age) + 1.5(Obese)

Example Patients:

Patient A Age = 60 Obese = 1

LPA = -6 + (0.04×60) + (1.5×1) = -6 +2.4 +1.5 = -2.1

Patient B Age = 30 Obese = 0

LPB = -6 + (0.04×30) = -6 +1.2 = -4.8

These values (–2.1, –4.8…) are LP values.


4 How LP Becomes a Distribution (Debray Approach 2)

Once you compute LP for all patients, you:

  1. Collect the LP values
  2. Visualize them in a histogram
  3. Convert the histogram into a density curve (LP distribution)
  4. Compare LP mean & LP SD between development and validation datasets

Differences in:

These determine relatedness.


Step-by-Step Visual Explanation

Below are your four steps EXACTLY as they were generated.

🔵 Step 1 — LP Value for Each Patient

Model coefficients + patient predictor values → One LP per patient

You see one dot per patient.

🟠 Step 2 — First 20 LP Values (Example Raw Values)

These are computed directly from:

[\text{LP} = -6 + 0.04 \cdot \text{Age} + 1.5 \cdot \text{Obese}]

Example printed values (from your code):

[-3.72, -1.98, -4.72, -4.88, -3.34, -3.26, -3.5 , -3.1 , -5.2 ,
 -3.06, -3.66, -4.72, -4.92, -3.4 , -4.96, -4.2 , -3.2 , -2.9 ,
 -3.72, -4.48]

These are EXACT LP values.

🔶 Step 3 — Histogram of LP Values

Groups the LP values into bins (ranges).

This shows how many patients fall into each LP range.

🟨 Step 4 — Smooth LP Distribution Curve

This is the LP distribution used in Debray’s Approach 2.

This representation makes it easy to compare:

Between development & validation datasets.

These are the two core measures of relatedness in Debray’s framework.


Summary

LP values come from:

LP distribution comes from:

Debray Step 1 uses LP because: