top of page

Choosing GEE Correlation aka. Repeated Measures: ind = Independence, exc = exchangeable, ar1 = Autoregressive Order 1, sta1 = Stationary m-dependent (m=1), uns = Unstructured

Updated: Jul 8

Marginal Model-Based Correlation Structures in GEE (Generalized Estimating Equations)

Marginal models (via GEE) assume correlated outcomes within subjects/clusters and require specification of a "working correlation structure".

While coefficient estimates (β) are consistent regardless of structure, the efficiency of SE estimates depends heavily on choosing an appropriate correlation form.

🚫 ind — Independent

  • Assumes: No within-subject correlation (ρ = 0)

  • 🔥 Not appropriate in repeated measures — violates core assumption of within-subject dependence

  • ❌ Use only if you're absolutely sure responses are uncorrelated (which defeats the purpose of GEE)

  • ✅ Use in clustered cross-section, NOT in longitudinal/repeated measures

Conclusion: Do not use ind if you're modeling intra-subject dependency — it contradicts the purpose of GEE in this context.


✅ exchangeable — Compound Symmetry (CS)

  • Assumes: All pairwise correlations are equal

  • Use when:

    • Time spacing is not relevant

    • Repeated measures are irregular or few

    • Clinical example: BP at admission, discharge, follow-up (no decay expected)

xtgee y x1 x2, family(gaussian) link(identity) corr(exchangeable) i(id) vce(robust)
Advantage: Efficient if assumption holds; still robust if not (with vce(robust))


✅ ar1 — Autoregressive Order 1

  • Assumes: Correlation decays exponentially with time lag

    Corr(yt,yt+k)=ρk\text{Corr}(y_t, y_{t+k}) = \rho^k

  • Use when:

    • Equally spaced time points (e.g., every week/month)

    • There's temporal ordering

    • Clinical example: Daily symptom ratings, serial biomarkers

xtgee y x1 x2, family(gaussian) link(identity) corr(ar1) i(id) vce(robust)
Warning: Requires proper time index and spacing — not for irregular intervals


✅ sta1 — Stationary m-dependent (m=1)

  • Assumes:

    • Observations ≤ m steps apart have non-zero correlation

    • All others = 0

  • sta1: Correlation only among immediate neighbors

  • Use when:

    • You expect correlation only between adjacent time points

    • Clinical example: Hourly measurements where only consecutive readings are related

  • Syntax:

xtgee y x1 x2, family(gaussian) link(identity) corr(sta1) i(id) vce(robust)
Note: Underused but powerful in short, dense, equally spaced panels


✅ uns — Unstructured

  • Allows: Each pair of observations to have its own unique correlation

  • Most flexible — no assumptions

  • BUT: Requires large sample sizes and many observations per subject

  • Use when:

    • N is large (hundreds of subjects)

    • Each subject has many (≥5–6) repeated measures

    • You want to let the data dictate the correlation

xtgee y x1 x2, family(gaussian) link(identity) corr(uns) i(id) vce(robust)
Caution: Fails or overfits with sparse/imbalanced data. Huge computational cost.


🧭 Summary Table: GEE Correlation Structures for Clinical Repeated Measures

Structure

Meaning

When to Use

Key Assumptions

Stata corr()

Caution

ind

Independence

Never (for repeated measures)

No correlation

independent

Violates longitudinal logic

exchangeable

Equal correlation

Irregular, short, balanced reps

Constant ρ across all timepoints

exchangeable

Assumption fails with decaying correlation

ar1

Decaying (lag-based)

Equally spaced, ordered timepoints

Corr decays exponentially by lag

ar1

Invalid for irregular timepoints

sta1

Neighbor correlation only

Very short panel, e.g., hourly or closely spaced

Corr only among adjacent timepoints

sta1

Rarely used, underdocumented

uns

Fully unstructured

Very large N + many repeated measures

No assumption

uns

Overfits unless data is massive

💡 Clinical Decision Logic

  • Sparse, irregular follow-ups → Use exchangeable

  • Time-ordered, equally spaced → Use ar1

  • Immediate-neighbor dependence only → Use sta1

  • Dense + high-frequency + large N → uns

  • Never use ind unless modeling cross-sectional clusters (not repeated measures)

Correlation Matrix Simulation — GEE Working Correlation Structures

Data: Repeated measures on 5 time points (T1–T5)

🔴 corr(independent)

Assumes no within-subject correlation (ρ = 0) — each time point is independent.

T1

T2

T3

T4

T5

T1

1.00

0.00

0.00

0.00

0.00

T2

0.00

1.00

0.00

0.00

0.00

T3

0.00

0.00

1.00

0.00

0.00

T4

0.00

0.00

0.00

1.00

0.00

T5

0.00

0.00

0.00

0.00

1.00

🟢 corr(exchangeable)

Assumes constant correlation between all pairs (ρ = 0.80)

T1

T2

T3

T4

T5

T1

1.00

0.80

0.80

0.80

0.80

T2

0.80

1.00

0.80

0.80

0.80

T3

0.80

0.80

1.00

0.80

0.80

T4

0.80

0.80

0.80

1.00

0.80

T5

0.80

0.80

0.80

0.80

1.00

🟡 corr(ar1)

Assumes correlation decays by lag (ρ = 0.90, then ρ² = 0.81, ρ³ = 0.73…)

T1

T2

T3

T4

T5

T1

1.00

0.90

0.81

0.73

0.66

T2

0.90

1.00

0.90

0.81

0.73

T3

0.81

0.90

1.00

0.90

0.81

T4

0.73

0.81

0.90

1.00

0.90

T5

0.66

0.73

0.81

0.90

1.00

🔵 corr(sta1)

Only adjacent timepoints are correlated (lag-1), ρ = 0.80

T1

T2

T3

T4

T5

T1

1.00

0.80

0.00

0.00

0.00

T2

0.80

1.00

0.80

0.00

0.00

T3

0.00

0.80

1.00

0.80

0.00

T4

0.00

0.00

0.80

1.00

0.80

T5

0.00

0.00

0.00

0.80

1.00

🟣 corr(uns)

No assumptions — each pair has its own unique correlation

T1

T2

T3

T4

T5

T1

1.00

0.86

0.74

0.55

0.33

T2

0.86

1.00

0.71

0.49

0.40

T3

0.74

0.71

1.00

0.69

0.50

T4

0.55

0.49

0.69

1.00

0.66

T5

0.33

0.40

0.50

0.66

1.00


Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page