Epitab in Stata: Classical Epidemiologic Analysis with 2×2 (confusion matrix) and Stratified Tables
- Mayta
- 10 hours ago
- 3 min read
Overview
Epitab is a suite of Stata commands designed for classical epidemiologic analyses based on 2×2 and stratified tables. It provides design-consistent estimation of effect measures, confidence intervals, and attributable fractions across cohort, case–control, cross-sectional, and matched study designs.
Unlike regression models (e.g., logistic, poisson, stcox), Epitab commands are table-based, transparent, and ideal for:
Crude and stratified analyses
Teaching epidemiologic estimands
Reproducing textbook and guideline analyses
Sanity-checking regression outputs
The Epitab family includes commands for rates, risks, odds, trend tests, Mantel–Haenszel adjustment, and matched data.
Command Families by Study Design
1. Incidence-Rate Data (Person-Time)
ir — Incidence-Rate Ratio and Difference
Used when outcomes are expressed as events per person-time.
Key outputs
Incidence rates (exposed vs unexposed)
Incidence-rate ratio (IRR)
Incidence-rate difference (IRD)
Attributable / prevented fractions
Exact or asymptotic CIs
Stratified options
Mantel–Haenszel pooled IRR
Homogeneity tests across strata
Direct or indirect standardization
Immediate form
iri — calculator-style input (counts + person-time)
When to prefer
Unequal follow-up
Dynamic populations
Precursor to Poisson regression
Related models
poisson, glm, family(poisson)
stcox (when censoring matters)
2. Cohort and Cross-Sectional Risk Data
cs — Risk Difference, Risk Ratio (± Odds Ratio)
Used when follow-up time is equal for all subjects or when analyzing cumulative incidence.
Key outputs
Risk in exposed vs unexposed
Risk difference (RD)
Risk ratio (RR)
Optional odds ratio (OR)
Attributable / prevented fractions
Enhancements
Fisher’s exact test
Internal (Mantel–Haenszel) standardization
Stratified analyses
Immediate form
csi — calculator-style input from summary counts
When to prefer
Cohort studies with fixed follow-up
Teaching RR vs OR
Baseline risk tables in trials
Regression analogues
glm, family(binomial) link(log)
Modified Poisson (glm, family(poisson))
3. Case–Control and Cross-Sectional Odds
cc — Odds Ratio from 2×2 Tables
Used when sampling is conditioned on outcome.
Key outputs
Odds ratio with CI
Attributable fractions (exposed / population)
Stratified Mantel–Haenszel ORs
Homogeneity testing
Breslow–Day test
Tarone’s test
Immediate form
cci — calculator-style entry
Design principle
Estimates OR correctly under case–control sampling
Does not estimate risks or rates
Regression analogue
logistic
4. Odds Across Multiple Exposure Categories
tabodds — Odds, Odds Ratios, and Trend Tests
Used when exposure is categorical or ordinal.
Capabilities
Tabulates odds by exposure category
Computes ORs using user-defined reference
Mantel–Haenszel adjusted ORs
Score-based χ² tests:
Homogeneity
Linear trend in log-odds
Visualization
Optional CI plots across categories
Use cases
Dose–response screening
Ordinal exposures
Teaching trend concepts before regression
5. Adjusted Odds Ratios (Stratified Control)
mhodds — Mantel–Haenszel Odds Ratios
Estimates adjusted odds ratios controlling for categorical confounders.
Features
1-df trend tests for ordered exposures
Approximate log-OR using score statistics
One-step Newton–Raphson estimation
Interpretation
Population-averaged OR
Transparent alternative to logistic regression
Best for
Teaching confounding control
Small datasets
Reproducing classical epidemiology examples
6. Matched Case–Control Studies
mcc — McNemar-Based Analysis
Used for pair-matched or set-matched designs.
Key outputs
McNemar’s χ²
Paired risk difference
Paired risk ratio
Odds ratio with CI
Immediate form
mcci — direct cell-count entry
Design note
Conditions on matched pairs
Correct for within-pair correlation
Regression analogue
clogit
Why Use Epitab in Modern Clinical Research?
Strengths
Design-faithful estimands
Minimal assumptions
Fully transparent calculations
Ideal for methods sections and teaching
Excellent diagnostic tool before regression
Limitations
Limited covariate adjustment
Not suitable for continuous predictors
No flexible functional forms
Not prediction-oriented
Best practice:Use Epitab first, then confirm with regression models.
Teaching & Workflow Recommendations
Cohort study? Start with cs, confirm with log-binomial or Poisson
Person-time? Use ir, then Poisson or Cox
Case–control? Begin with cc or tabodds, then logistic
Matched data? Use mcc, then clogit
Epitab provides the epidemiologic intuition that regression often obscures.
Conclusion
Epitab remains one of Stata’s most valuable—but underused—toolkits for classical epidemiologic analysis. For CECS PhD students and clinical researchers, it bridges the gap between design-based reasoning and model-based inference, ensuring that effect estimates remain interpretable, reproducible, and scientifically grounded.




