← All posts

Understanding Regression: From Correlation to Clinical Modeling [Multiple imputation, MI]

Clinical Epidemiology ResearchUniqcret doctor knowledgesData Analytics or Statistics

Introduction

Regression analysis is foundational in clinical research, serving as a tool to examine and quantify relationships between variables. Unlike mere correlation, regression provides not only an estimate of strength but also the nature and direction of associations—empowering researchers to predict outcomes, identify risk factors, and model complex clinical scenarios.

This article introduces the essential types of regression—Gaussian and logistic—explains their assumptions, and shows how to navigate non-linearities and interaction effects. It provides a clear, structured foundation for health professionals and clinical researchers seeking to move from exploratory to explanatory modeling.


Correlation vs Regression: Two Distinct Purposes

Correlation: Measuring Strength of Association

However, correlation lacks directionality or causal interpretation—it tells us nothing about which variable is driving the other or how to predict one from the other.


Regression: Modeling the Relationship


Gaussian and Logistic Regression: Choosing the Right Tool

Gaussian (Linear) Regression

Linear Equation
Y = a + b X

a is the intercept, and b is the beta coefficient (or slope), representing the mean change in Y for a one-unit increase in X.

SBP Equation
SBP = 95 + 0.7 × Age

Each additional year increases SBP by 0.7 mmHg, on average.


Logistic Regression

Logistic Regression
logit ( P ) = a + b X

Where P is the probability of the event occurring. The output is in log-odds; the exponentiated beta coefficient (eb) gives the odds ratio (OR).

Logistic Regression with BMI
log ( P 1 P ) = 2.3 + 0.1 × BMI

An odds ratio (OR) of 1.1 per unit of BMI means that each additional BMI point increases the odds of developing diabetes by 10%.


Modeling Non-Linear Relationships

The Linearity Assumption

Standard regression assumes a straight-line relationship:

Y = a+bX

Yet in practice, many relationships curve or plateau. For instance, age may increase risk for stroke up to a point, after which the relationship flattens.


Solutions for Non-Linearity

Quadratic Regression Model
Y = a + b1X + b2 X2

Application:

In a model predicting cholesterol based on age:


Modeling Interactions: When Effects Depend on Context

Understanding Interactions

An interaction exists when the effect of one predictor depends on the level of another.

This indicates a statistically significant interaction: the drug benefits men but not women.


Model Specification

Modeling Interactions
Y = a + b1X + b2Z + b3XZ

To model interactions, include both main effects and the interaction term in the equation:

Y = a + b₁X + b₂Z + b₃XZ

In statistical software like Stata or R, use the syntax X*Z or X:Z to specify interaction terms.


Clinical Relevance

Interactions matter for personalized medicine. They help identify for whom treatments work, not just whether they work.


Conclusion

Regression is more than just a statistical tool—it's a language for expressing and testing clinical hypotheses. Whether exploring relationships, adjusting for confounding, or predicting patient outcomes, understanding how to use Gaussian and logistic regression appropriately—and how to account for non-linearities and interactions—enables more precise, meaningful clinical research.


Key Takeaways