top of page

Choosing the Right Model for Correlated Data: A Step-by-Step Guide [Repeated Measures, Multilevel Modeling, Mixed Effects, GEE Model, Random Effects, Robust SE, Longitudinal Data, Clustered Data]

Updated: Jul 7

STEP 1: 🔍 Describe Your Data Structure

Ask:

  1. Are measurements repeated over time on the same subject?

  2. Are observations nested in groups (e.g., patients in clinics, students in classes)?

  3. Are there multiple levels of clustering (e.g., patients → doctors → hospitals)?

  4. Is time spacing regular (every 3 months) or irregular?

STEP 2: 🧩 Identify the Correlation Source

Data Type

What Makes It Correlated?

Example

Repeated measures

Same subject over time

BP at 1, 3, 6 months

Clustered data

Subjects in same unit

Patients in ward

Hierarchical data

Nested clusters

Patient → doctor → hospital

Paired/matched data

Shared factors

Left eye vs right eye


STEP 3: 🎯 Choose the Analytical Goal

Goal

Use Model That Estimates...

Best For

Population-level average

Marginal model (GEE)

Guidelines, public health

Subject-specific trends

Conditional model (Mixed Effects)

Individual prediction, growth curves

Group-specific effects

Multilevel model

School/ward/hospital policy

Only need corrected SEs

Empirical variance correction

Regression w/ cluster SEs


STEP 4: 📊 Map Your Scenario to the Model

Scenario

Best Model

Correlation Correction

Single-level repeated measures (time)

GEE or Mixed Effects

AR(1) or Exchangeable

Nested groups (patients in clinics)

Multilevel Mixed Model

Random Intercepts

Grouped + repeated (patients in hospitals over time)

Mixed Model with Random Intercepts + Slopes

Hierarchical + Time correlation

No strong structure but correlated outcomes

GEE with Robust SE

Empirical correction

Very small sample or irregular visits

Unstructured correlation

Data-defined estimation


STEP 5: 🧪 Select the Variance & Correlation Strategy

Component

Option

When to Use

Variance Estimation

Empirical (Robust SE)

When model fit isn’t your focus, but SE correction is critical


Model-based

When you trust your model structure & correlation

Correlation Structure

Exchangeable

Equal correlation (common in clusters)


AR(1)

Time-spaced data where closer = more correlated


Unstructured

Large sample, unknown pattern

Random Effect Specification

Random Intercepts

Different starting levels


Random Slopes

Different trends over time or treatment


🔑 SECRET INSIGHT BOX

  • GEE = population average, no prediction per patient

  • Mixed-effects = subject-level prediction

  • Random slope ≠ time interaction: it reflects natural heterogeneity in trends

  • Always visualize trajectories before choosing slope models

🧭 Decision Map Summary

IF repeated measures → Is it over time?
     ├─ Yes: AR(1) or Random Slopes
     └─ No: Exchangeable or Random Intercepts

IF nested in groups:
     ├─ Just one level? → Random Intercepts
     └─ Multiple levels? → Multilevel model with nested random effects

IF unsure of pattern → Try Unstructured (if sample size supports it)


1  Describe Your Data Structure  🔍

Deep‑dive

Ask four questions before touching software:

  1. Repeated over time? Same person measured >1 time ⇒ temporal correlation.

  2. Nested groups? Patients in wards, eyes within patients, etc. ⇒ clustering.

  3. Multiple levels? Patient → Doctor → Hospital ⇒ hierarchical nesting.

  4. Timing pattern? Equally spaced visits favour AR(1); irregular timing often requires mixed‑effects with random slopes.

Why it matters – each “Yes” forces extra parameters in the covariance matrix. Incorrect specification inflates Type I error or wipes power.

Plain‑speak cheat

Write down “TIME?” and “NESTING?” first; the answers drive everything that follows.


2  Identify the Correlation Source  🧩

Deep‑dive

Data type

Mechanism

Typical example

Quick diagnostic plot

Repeated measures

Within‑subject memory

BP at 1, 3, 6 months

Spaghetti plot of each patient

Clustered

Shared care environment

Patients in same ICU

Box‑and‑whisker by cluster

Hierarchical

Stacked clustering

Patient → Doctor

Variance‐components plot

Paired

Biological pairing

Left vs Right eye

Bland‑Altman / scatter of pair

Plain‑speak cheat

If two rows share a patient‑ID, clinic‑ID, or visit‑date, they’re probably correlated.


3  Clarify Your Analytical Goal  🎯

Deep‑dive

Goal

Key question

Proper model family

Interpretation focus

Population average

“What is the mean effect across everyone?”

GEE / Marginal

Guidelines & policy

Subject‑specific

“How does this patient change?”

Mixed‑effects (conditional)

Precision medicine

Group effects

“Do hospitals differ?”

Multilevel mixed

Quality benchmarking

Just robust SEs

“I only fear clustered SE inflation.”

OLS/GLM + Sandwich

Simple regressions

Plain‑speak cheat

Pick GEE for public‑health answers; mixed‑effects for patient‑level answers.


4  Map Scenario → Model  📊

Deep‑dive

Scenario

Best model

Working correlation / random structure

R / Stata hint

6 time‑points per patient, no sites

GEE

AR(1) or exchangeable

geeglm(..., corstr="ar1")

Patients clustered in 12 clinics, one outcome

Mixed (random intercept)

Random intercept only

`lmer(Y ~ X + (1

Patients in 12 clinics, 5 visits each

Mixed (int + slope)

Random int + slope by patient; random int by clinic

`lmer(Y ~ time + (time

4000 patients, 2 exams each, need quick answer

GLM + cluster‑robust SE

Sandwich (empirical)

glm(...); sandwich()

30 patients, 10 uneven visits

Unstructured mixed (if convergence)

UN covariance

lme(..., correlation=corSymm())

Plain‑speak cheat

One level = random intercept; time + one level = add random slope; many levels = stack random intercepts.


5  Pick Variance & Correlation Strategy  🧪

Deep‑dive

Choice point

Options

Use when

Caveat

Variance estimator

Empirical (robust)

Large N, misspecified corr. OK

Still biased if clusters < 30


Model‑based

Correlation well‑specified

Sensitive to wrong pattern

Correlation pattern

Exchangeable

All pairs equally related (wards)

Over‑simplifies time data


AR(1)

Equal spacing & decay

Fails if visits irregular


Unstructured

Plenty of rows/cluster

Parameter‑hungry

Random effect

Intercept

Baseline shifts only

Assumes parallel trajectories


Intercept + Slope

Heterogeneous change rates

Needs ≥3 time points/subject

Plain‑speak cheat

Short panels → AR(1). Big uncertain panels → UN. Parallel lines? random‑intercept; diverging lines? random‑slope.


6  Reality Checks & Visuals  🔑

  1. Spaghetti plot before modelling – are patient lines parallel?

  2. Intraclass correlation (ICC) – if ≈0, maybe clustering is harmless.

  3. Residual AC‑plot – confirms AR(1) choice.

Plain‑speak cheat

Plot first; if lines aren’t parallel your model shouldn’t be either.


Decision Flow (ASCII)  🧭

Start
 ├─ Repeated over time?
 │     ├─ Yes → Equal spacing?
 │     │      ├─ Yes → AR(1) or Random Slopes
 │     │      └─ No  → Random Slopes + Time-as-variable
 │     └─ No → Clustered?
 │            ├─ One level → Random Intercepts
 │            └─ ≥Two levels → Hierarchical Mixed
 └─ Unsure pattern → Try Unstructured (data permitting)

Worked Micro‑Example

Design: 150 COPD patients across 7 clinics, FEV1 at baseline, 3 m, 6 m.Goal: Predict individual recovery curves.

  1. Structure → repeated + nested (patients in clinics).

  2. Goal → subject‑specific.

  3. Model → Mixed‑effects with random intercept (clinic) & intercept + slope (patient).

  4. Spec (R):

lmer(FEV1 ~ time + (time|patient_id) + (1|clinic_id), data = copd)
  1. Working correlation → implied by random effects; no extra AR(1) needed because random slopes capture within‑patient trajectory.

Quick Reference Card  📋 (TL;DR)

Ask

If “Yes” →

Model

Time repeated?

Equal gaps

GEE/Mixed + AR(1)


Unequal gaps

Mixed + Random Slopes

Single clustering level?

Mixed + Random Intercept

Multiple levels?

Multilevel Mixed

Need only robust SE?

Sandwich


Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page