Choosing RR or OR in Epidemiologic Studies: cs vs cc Commands in Stata

Mayta
Feb 8
3 min read

If you remember one rule, remember this:

cs → cohort/RCT/cross-sectional when you can interpret risk (or prevalence) → gives RR/RD
cc → case-control (sampled by outcome) → gives OR (only)

(Stata even labels this as “Risk & Odds Analysis” with cs for RR and logistic for OR in the quick reference.)

The decision flowchart

How were subjects selected?

A) Selected by EXPOSURE status (exposed/unexposed) and followed (or measured) outcome?
   -> Cohort / RCT / cross-sectional without outcome-based sampling
   -> Use cs  (RR/RD)

B) Selected by OUTCOME status (cases/controls), then looked back at exposure?
   -> Case-control / nested case-control (outcome-based sampling)
   -> Use cc  (OR)

Fast check: If your dataset has a fixed number of cases/controls by design (e.g., 1:1 matching, 1:4 controls), you’re in B → cc territory.

What each command computes (2×2 table logic)

Let:

case = 1 means diseased / event
exposed = 1 means exposed

cs: Risk Ratio (RR) and Risk Difference (RD)

              case=1     case=0     total
exposed=1        a         b        a+b     risk1 = a/(a+b)
exposed=0        c         d        c+d     risk0 = c/(c+d)

RR = risk1 / risk0
RD = risk1 - risk0

cc: Odds Ratio (OR)

              exposed=1  exposed=0
case=1            a         c
case=0            b         d

OR = (a/c) / (b/d) = ad / bc

Key interpretation difference

RR compares probabilities
OR compares odds (and can look “bigger” than RR when the outcome is common)

Scenario 1 — “I have a cohort / RCT / cross-sectional sample” → use cs

A. Individual-level data

* case: 0/1 outcome
* exp:  0/1 exposure
cs case exp
return list

Notes: cs reports RD + RR by default; return list shows r(rr), r(rd), CIs, p-value.

B. Frequency (aggregated) data (2×2 counts stored as weights)

cs case exp [fw=pop]

Notes: Use this when you have a dataset with 4 rows representing the 4 cells and a count variable pop.

C. Stratified (Mantel–Haenszel RR)

cs case exp [fw=pop], by(agegrp)

Notes: Gives stratum-specific + Mantel–Haenszel combined RR (and heterogeneity test).

D. Small samples → exact p-value

cs case exp [fw=pop], exact

Notes: Fisher’s exact p-value is safer when cells are small.

E. “But I want OR too” (not typical, but possible)

cs case exp [fw=pop], or

Notes: cs, or adds an OR, but for OR-focused work you usually jump to cc or logistic.

Scenario 2 — “I have a case-control study (cases + controls)” → use cc

A. Individual-level data

* case: 0/1 disease indicator (defines cases vs controls)
* exp:  0/1 exposure
cc case exp
return list

Notes: cc reports OR (+ AFE/AFP) with CI.

B. Frequency (aggregated) data

cc case exp [fw=pop]

Notes: Same idea as cs—counts in pop.

C. Stratified case-control OR (Mantel–Haenszel)

cc case exp [fw=pop], by(agegrp)

Notes: Gives MH pooled OR + heterogeneity tests.

D. Effect modification / homogeneity tests (stratified)

cc case exp [fw=pop], by(agegrp) bd
cc case exp [fw=pop], by(agegrp) tarone

Notes: bd = Breslow–Day; tarone = Tarone correction.

E. Small samples

cc case exp [fw=pop], exact

Notes: Fisher exact p-value; strong default when sparse.

F. CI approximation methods (mostly for large samples)

cc case exp, woolf
cc case exp, cornfield

Notes: Woolf/Cornfield are approximations; default behavior is often preferable unless you have a reason.

“Show you with Stata’s own example datasets”

cs example (cohort-style)

webuse csxmpl, clear
list
cs case exp [fw=pop]
csi 7 12 9 2

Notes: csi is the immediate form (you type a b c d directly).

cc example (case-control-style)

webuse ccxmpl, clear
list
cc case exposed [fw=pop]
cci 4 386 4 1250

Notes: cci is the immediate form for OR.

Cross-sectional data: the common confusion (RR/PR vs OR)

Cross-sectional is tricky because people say “risk ratio” but you’re really estimating a prevalence ratio (PR).

Rule:

If you want PR, use cs (unadjusted)
If you run cc or logistic, you get OR, which can exaggerate association when prevalence is high.

Adjusted PR (recommended approach) = modified Poisson (robust SE)

glm case i.exp age sex, family(poisson) link(log) vce(robust) eform

Notes: This gives an adjusted PR/RR-like estimate (common in CECS work) when outcome is common.

The “Ah okay” cheat sheet

Use cs when:

Cohort / RCT / cross-sectional sample not selected by outcome
You want RR and/or RD
Follow-up time is effectively equal (risk is meaningful)

Use cc when:

Case-control / nested case-control / sampling based on outcome
You want OR
RR/RD are not valid from the sampling design

But

If you need adjustment:

Cohort/RCT RR-like: glm ..., family(poisson) link(log) vce(robust) eform
Case-control adjusted OR: logistic case i.exp covariates, or

Choosing RR or OR in Epidemiologic Studies: cs vs cc Commands in Stata

The decision flowchart

What each command computes (2×2 table logic)

cs: Risk Ratio (RR) and Risk Difference (RD)

cc: Odds Ratio (OR)

Scenario 1 — “I have a cohort / RCT / cross-sectional sample” → use cs

A. Individual-level data

B. Frequency (aggregated) data (2×2 counts stored as weights)

C. Stratified (Mantel–Haenszel RR)

D. Small samples → exact p-value

E. “But I want OR too” (not typical, but possible)

Scenario 2 — “I have a case-control study (cases + controls)” → use cc

A. Individual-level data

B. Frequency (aggregated) data

C. Stratified case-control OR (Mantel–Haenszel)

D. Effect modification / homogeneity tests (stratified)

E. Small samples

F. CI approximation methods (mostly for large samples)

“Show you with Stata’s own example datasets”

cs example (cohort-style)

cc example (case-control-style)

Cross-sectional data: the common confusion (RR/PR vs OR)

Adjusted PR (recommended approach) = modified Poisson (robust SE)

The “Ah okay” cheat sheet

But

Recent Posts

Comments