Choosing RR or OR in Epidemiologic Studies: cs vs cc Commands in Stata
- Mayta

- Feb 8
- 3 min read
If you remember one rule, remember this:
cs → cohort/RCT/cross-sectional when you can interpret risk (or prevalence) → gives RR/RD
cc → case-control (sampled by outcome) → gives OR (only)
(Stata even labels this as “Risk & Odds Analysis” with cs for RR and logistic for OR in the quick reference.)
The decision flowchart
How were subjects selected?
A) Selected by EXPOSURE status (exposed/unexposed) and followed (or measured) outcome?
-> Cohort / RCT / cross-sectional without outcome-based sampling
-> Use cs (RR/RD)
B) Selected by OUTCOME status (cases/controls), then looked back at exposure?
-> Case-control / nested case-control (outcome-based sampling)
-> Use cc (OR)
Fast check: If your dataset has a fixed number of cases/controls by design (e.g., 1:1 matching, 1:4 controls), you’re in B → cc territory.
What each command computes (2×2 table logic)
Let:
case = 1 means diseased / event
exposed = 1 means exposed
cs: Risk Ratio (RR) and Risk Difference (RD)
case=1 case=0 total
exposed=1 a b a+b risk1 = a/(a+b)
exposed=0 c d c+d risk0 = c/(c+d)
RR = risk1 / risk0
RD = risk1 - risk0
cc: Odds Ratio (OR)
exposed=1 exposed=0
case=1 a c
case=0 b d
OR = (a/c) / (b/d) = ad / bc
Key interpretation difference
RR compares probabilities
OR compares odds (and can look “bigger” than RR when the outcome is common)
Scenario 1 — “I have a cohort / RCT / cross-sectional sample” → use cs
A. Individual-level data
* case: 0/1 outcome
* exp: 0/1 exposure
cs case exp
return list
Notes: cs reports RD + RR by default; return list shows r(rr), r(rd), CIs, p-value.
B. Frequency (aggregated) data (2×2 counts stored as weights)
cs case exp [fw=pop]
Notes: Use this when you have a dataset with 4 rows representing the 4 cells and a count variable pop.
C. Stratified (Mantel–Haenszel RR)
cs case exp [fw=pop], by(agegrp)
Notes: Gives stratum-specific + Mantel–Haenszel combined RR (and heterogeneity test).
D. Small samples → exact p-value
cs case exp [fw=pop], exact
Notes: Fisher’s exact p-value is safer when cells are small.
E. “But I want OR too” (not typical, but possible)
cs case exp [fw=pop], or
Notes: cs, or adds an OR, but for OR-focused work you usually jump to cc or logistic.
Scenario 2 — “I have a case-control study (cases + controls)” → use cc
A. Individual-level data
* case: 0/1 disease indicator (defines cases vs controls)
* exp: 0/1 exposure
cc case exp
return list
Notes: cc reports OR (+ AFE/AFP) with CI.
B. Frequency (aggregated) data
cc case exp [fw=pop]
Notes: Same idea as cs—counts in pop.
C. Stratified case-control OR (Mantel–Haenszel)
cc case exp [fw=pop], by(agegrp)
Notes: Gives MH pooled OR + heterogeneity tests.
D. Effect modification / homogeneity tests (stratified)
cc case exp [fw=pop], by(agegrp) bd
cc case exp [fw=pop], by(agegrp) tarone
Notes: bd = Breslow–Day; tarone = Tarone correction.
E. Small samples
cc case exp [fw=pop], exact
Notes: Fisher exact p-value; strong default when sparse.
F. CI approximation methods (mostly for large samples)
cc case exp, woolf
cc case exp, cornfield
Notes: Woolf/Cornfield are approximations; default behavior is often preferable unless you have a reason.
“Show you with Stata’s own example datasets”
cs example (cohort-style)
webuse csxmpl, clear
list
cs case exp [fw=pop]
csi 7 12 9 2
Notes: csi is the immediate form (you type a b c d directly).
cc example (case-control-style)
webuse ccxmpl, clear
list
cc case exposed [fw=pop]
cci 4 386 4 1250
Notes: cci is the immediate form for OR.
Cross-sectional data: the common confusion (RR/PR vs OR)
Cross-sectional is tricky because people say “risk ratio” but you’re really estimating a prevalence ratio (PR).
Rule:
If you want PR, use cs (unadjusted)
If you run cc or logistic, you get OR, which can exaggerate association when prevalence is high.
Adjusted PR (recommended approach) = modified Poisson (robust SE)
glm case i.exp age sex, family(poisson) link(log) vce(robust) eform
Notes: This gives an adjusted PR/RR-like estimate (common in CECS work) when outcome is common.
The “Ah okay” cheat sheet
Use cs when:
Cohort / RCT / cross-sectional sample not selected by outcome
You want RR and/or RD
Follow-up time is effectively equal (risk is meaningful)
Use cc when:
Case-control / nested case-control / sampling based on outcome
You want OR
RR/RD are not valid from the sampling design
But
If you need adjustment:
Cohort/RCT RR-like: glm ..., family(poisson) link(log) vce(robust) eform
Case-control adjusted OR: logistic case i.exp covariates, or






Comments