Choosing GEE Correlation aka. Repeated Measures: ind = Independence, exc = exchangeable, ar1 = Autoregressive Order 1, sta1 = Stationary m-dependent (m=1), uns = Unstructured
- Mayta
- Jul 8
- 3 min read
Updated: Jul 8
Marginal Model-Based Correlation Structures in GEE (Generalized Estimating Equations)
Marginal models (via GEE) assume correlated outcomes within subjects/clusters and require specification of a "working correlation structure".
While coefficient estimates (β) are consistent regardless of structure, the efficiency of SE estimates depends heavily on choosing an appropriate correlation form.
🚫 ind — Independent
- Assumes: No within-subject correlation (ρ = 0) 
- 🔥 Not appropriate in repeated measures — violates core assumption of within-subject dependence 
- ❌ Use only if you're absolutely sure responses are uncorrelated (which defeats the purpose of GEE) 
- ✅ Use in clustered cross-section, NOT in longitudinal/repeated measures 
Conclusion: Do not use ind if you're modeling intra-subject dependency — it contradicts the purpose of GEE in this context.
✅ exchangeable — Compound Symmetry (CS)
- Assumes: All pairwise correlations are equal 
- Use when: - Time spacing is not relevant 
- Repeated measures are irregular or few 
- Clinical example: BP at admission, discharge, follow-up (no decay expected) 
 
xtgee y x1 x2, family(gaussian) link(identity) corr(exchangeable) i(id) vce(robust)
Advantage: Efficient if assumption holds; still robust if not (with vce(robust))
✅ ar1 — Autoregressive Order 1
- Assumes: Correlation decays exponentially with time lag - Corr(yt,yt+k)=ρk\text{Corr}(y_t, y_{t+k}) = \rho^k 
- Use when: - Equally spaced time points (e.g., every week/month) 
- There's temporal ordering 
- Clinical example: Daily symptom ratings, serial biomarkers 
 
xtgee y x1 x2, family(gaussian) link(identity) corr(ar1) i(id) vce(robust)
Warning: Requires proper time index and spacing — not for irregular intervals
✅ sta1 — Stationary m-dependent (m=1)
- Assumes: - Observations ≤ m steps apart have non-zero correlation 
- All others = 0 
 
- sta1: Correlation only among immediate neighbors 
- Use when: - You expect correlation only between adjacent time points 
- Clinical example: Hourly measurements where only consecutive readings are related 
 
- Syntax: 
xtgee y x1 x2, family(gaussian) link(identity) corr(sta1) i(id) vce(robust)
Note: Underused but powerful in short, dense, equally spaced panels
✅ uns — Unstructured
- Allows: Each pair of observations to have its own unique correlation 
- Most flexible — no assumptions 
- BUT: Requires large sample sizes and many observations per subject 
- Use when: - N is large (hundreds of subjects) 
- Each subject has many (≥5–6) repeated measures 
- You want to let the data dictate the correlation 
 
xtgee y x1 x2, family(gaussian) link(identity) corr(uns) i(id) vce(robust)
Caution: Fails or overfits with sparse/imbalanced data. Huge computational cost.
🧭 Summary Table: GEE Correlation Structures for Clinical Repeated Measures
| Structure | Meaning | When to Use | Key Assumptions | Stata corr() | Caution | 
| ind | Independence | Never (for repeated measures) | No correlation | independent | Violates longitudinal logic | 
| exchangeable | Equal correlation | Irregular, short, balanced reps | Constant ρ across all timepoints | exchangeable | Assumption fails with decaying correlation | 
| ar1 | Decaying (lag-based) | Equally spaced, ordered timepoints | Corr decays exponentially by lag | ar1 | Invalid for irregular timepoints | 
| sta1 | Neighbor correlation only | Very short panel, e.g., hourly or closely spaced | Corr only among adjacent timepoints | sta1 | Rarely used, underdocumented | 
| uns | Fully unstructured | Very large N + many repeated measures | No assumption | uns | Overfits unless data is massive | 
💡 Clinical Decision Logic
- Sparse, irregular follow-ups → Use exchangeable 
- Time-ordered, equally spaced → Use ar1 
- Immediate-neighbor dependence only → Use sta1 
- Dense + high-frequency + large N → uns 
- Never use ind unless modeling cross-sectional clusters (not repeated measures) 
Correlation Matrix Simulation — GEE Working Correlation Structures
Data: Repeated measures on 5 time points (T1–T5)
🔴 corr(independent)
Assumes no within-subject correlation (ρ = 0) — each time point is independent.
| T1 | T2 | T3 | T4 | T5 | |
| T1 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 
| T2 | 0.00 | 1.00 | 0.00 | 0.00 | 0.00 | 
| T3 | 0.00 | 0.00 | 1.00 | 0.00 | 0.00 | 
| T4 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | 
| T5 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 
🟢 corr(exchangeable)
Assumes constant correlation between all pairs (ρ = 0.80)
| T1 | T2 | T3 | T4 | T5 | |
| T1 | 1.00 | 0.80 | 0.80 | 0.80 | 0.80 | 
| T2 | 0.80 | 1.00 | 0.80 | 0.80 | 0.80 | 
| T3 | 0.80 | 0.80 | 1.00 | 0.80 | 0.80 | 
| T4 | 0.80 | 0.80 | 0.80 | 1.00 | 0.80 | 
| T5 | 0.80 | 0.80 | 0.80 | 0.80 | 1.00 | 
🟡 corr(ar1)
Assumes correlation decays by lag (ρ = 0.90, then ρ² = 0.81, ρ³ = 0.73…)
| T1 | T2 | T3 | T4 | T5 | |
| T1 | 1.00 | 0.90 | 0.81 | 0.73 | 0.66 | 
| T2 | 0.90 | 1.00 | 0.90 | 0.81 | 0.73 | 
| T3 | 0.81 | 0.90 | 1.00 | 0.90 | 0.81 | 
| T4 | 0.73 | 0.81 | 0.90 | 1.00 | 0.90 | 
| T5 | 0.66 | 0.73 | 0.81 | 0.90 | 1.00 | 
🔵 corr(sta1)
Only adjacent timepoints are correlated (lag-1), ρ = 0.80
| T1 | T2 | T3 | T4 | T5 | |
| T1 | 1.00 | 0.80 | 0.00 | 0.00 | 0.00 | 
| T2 | 0.80 | 1.00 | 0.80 | 0.00 | 0.00 | 
| T3 | 0.00 | 0.80 | 1.00 | 0.80 | 0.00 | 
| T4 | 0.00 | 0.00 | 0.80 | 1.00 | 0.80 | 
| T5 | 0.00 | 0.00 | 0.00 | 0.80 | 1.00 | 
🟣 corr(uns)
No assumptions — each pair has its own unique correlation
| T1 | T2 | T3 | T4 | T5 | |
| T1 | 1.00 | 0.86 | 0.74 | 0.55 | 0.33 | 
| T2 | 0.86 | 1.00 | 0.71 | 0.49 | 0.40 | 
| T3 | 0.74 | 0.71 | 1.00 | 0.69 | 0.50 | 
| T4 | 0.55 | 0.49 | 0.69 | 1.00 | 0.66 | 
| T5 | 0.33 | 0.40 | 0.50 | 0.66 | 1.00 | 





