top of page

Epitab in Stata: Classical Epidemiologic Analysis with 2×2 (confusion matrix) and Stratified Tables

  • Writer: Mayta
    Mayta
  • 10 hours ago
  • 3 min read

Overview

Epitab is a suite of Stata commands designed for classical epidemiologic analyses based on 2×2 and stratified tables. It provides design-consistent estimation of effect measures, confidence intervals, and attributable fractions across cohort, case–control, cross-sectional, and matched study designs.

Unlike regression models (e.g., logistic, poisson, stcox), Epitab commands are table-based, transparent, and ideal for:

  • Crude and stratified analyses

  • Teaching epidemiologic estimands

  • Reproducing textbook and guideline analyses

  • Sanity-checking regression outputs

The Epitab family includes commands for rates, risks, odds, trend tests, Mantel–Haenszel adjustment, and matched data.

Command Families by Study Design

1. Incidence-Rate Data (Person-Time)

ir — Incidence-Rate Ratio and Difference

Used when outcomes are expressed as events per person-time.

Key outputs

  • Incidence rates (exposed vs unexposed)

  • Incidence-rate ratio (IRR)

  • Incidence-rate difference (IRD)

  • Attributable / prevented fractions

  • Exact or asymptotic CIs

Stratified options

  • Mantel–Haenszel pooled IRR

  • Homogeneity tests across strata

  • Direct or indirect standardization

Immediate form

  • iri — calculator-style input (counts + person-time)

When to prefer

  • Unequal follow-up

  • Dynamic populations

  • Precursor to Poisson regression

Related models

  • poisson, glm, family(poisson)

  • stcox (when censoring matters)


2. Cohort and Cross-Sectional Risk Data

cs — Risk Difference, Risk Ratio (± Odds Ratio)

Used when follow-up time is equal for all subjects or when analyzing cumulative incidence.

Key outputs

  • Risk in exposed vs unexposed

  • Risk difference (RD)

  • Risk ratio (RR)

  • Optional odds ratio (OR)

  • Attributable / prevented fractions

Enhancements

  • Fisher’s exact test

  • Internal (Mantel–Haenszel) standardization

  • Stratified analyses

Immediate form

  • csi — calculator-style input from summary counts

When to prefer

  • Cohort studies with fixed follow-up

  • Teaching RR vs OR

  • Baseline risk tables in trials

Regression analogues

  • glm, family(binomial) link(log)

  • Modified Poisson (glm, family(poisson))


3. Case–Control and Cross-Sectional Odds

cc — Odds Ratio from 2×2 Tables

Used when sampling is conditioned on outcome.

Key outputs

  • Odds ratio with CI

  • Attributable fractions (exposed / population)

  • Stratified Mantel–Haenszel ORs

Homogeneity testing

  • Breslow–Day test

  • Tarone’s test

Immediate form

  • cci — calculator-style entry

Design principle

  • Estimates OR correctly under case–control sampling

  • Does not estimate risks or rates

Regression analogue

  • logistic


4. Odds Across Multiple Exposure Categories

tabodds — Odds, Odds Ratios, and Trend Tests

Used when exposure is categorical or ordinal.

Capabilities

  • Tabulates odds by exposure category

  • Computes ORs using user-defined reference

  • Mantel–Haenszel adjusted ORs

  • Score-based χ² tests:

    • Homogeneity

    • Linear trend in log-odds

Visualization

  • Optional CI plots across categories

Use cases

  • Dose–response screening

  • Ordinal exposures

  • Teaching trend concepts before regression


5. Adjusted Odds Ratios (Stratified Control)

mhodds — Mantel–Haenszel Odds Ratios

Estimates adjusted odds ratios controlling for categorical confounders.

Features

  • 1-df trend tests for ordered exposures

  • Approximate log-OR using score statistics

  • One-step Newton–Raphson estimation

Interpretation

  • Population-averaged OR

  • Transparent alternative to logistic regression

Best for

  • Teaching confounding control

  • Small datasets

  • Reproducing classical epidemiology examples


6. Matched Case–Control Studies

mcc — McNemar-Based Analysis

Used for pair-matched or set-matched designs.

Key outputs

  • McNemar’s χ²

  • Paired risk difference

  • Paired risk ratio

  • Odds ratio with CI

Immediate form

  • mcci — direct cell-count entry

Design note

  • Conditions on matched pairs

  • Correct for within-pair correlation

Regression analogue

  • clogit


Why Use Epitab in Modern Clinical Research?

Strengths

  • Design-faithful estimands

  • Minimal assumptions

  • Fully transparent calculations

  • Ideal for methods sections and teaching

  • Excellent diagnostic tool before regression

Limitations

  • Limited covariate adjustment

  • Not suitable for continuous predictors

  • No flexible functional forms

  • Not prediction-oriented

Best practice:Use Epitab first, then confirm with regression models.

Teaching & Workflow Recommendations

  • Cohort study? Start with cs, confirm with log-binomial or Poisson

  • Person-time? Use ir, then Poisson or Cox

  • Case–control? Begin with cc or tabodds, then logistic

  • Matched data? Use mcc, then clogit

Epitab provides the epidemiologic intuition that regression often obscures.

Conclusion

Epitab remains one of Stata’s most valuable—but underused—toolkits for classical epidemiologic analysis. For CECS PhD students and clinical researchers, it bridges the gap between design-based reasoning and model-based inference, ensuring that effect estimates remain interpretable, reproducible, and scientifically grounded.


Recent Posts

See All
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page