Sample Is Not Quite Simple: A Clinical Epidemiologist’s Guide to Meaningful Sample Size Estimation
- Mayta
- May 2
- 4 min read
🔍 Introduction: Why Sample Size Estimation Is Not a Plug-and-Play Process
One of the most misunderstood elements of clinical research is sample size estimation. It’s often treated as a bureaucratic requirement—something to justify to ethics committees or funding agencies. But in reality, the sample size is the engine of your study’s validity.
The idea that there’s a magical number—say 30 subjects per group—or that there’s a universal equation that fits all research questions (e.g., the "Yamane" formula) is appealing but dangerous. This oversimplifies a process that demands alignment with the study’s clinical objective, respect for statistical nuance, and most importantly, an understanding of what’s at stake for patients.
This guide unpacks why sample size is not quite simple and how you, as a clinical researcher, can master this domain for more rigorous and impactful studies.
🎯 Part I: Busting the Common Misconceptions
1. The “Magic Number” of 30
A remnant of central limit theorem teachings, the idea that a sample of 30 is "good enough" is widely misapplied. Yes, under certain assumptions (normal distribution, low variance), n=30 might approximate population parameters. But in clinical research, we often deal with:
Rare outcomes
Skewed variables (e.g., length of stay, biomarkers)
Small effect sizes
In these settings, n=30 per group is often dangerously underpowered, leading to false negatives or misleading conclusions.
2. Misuse of the “Yamane Formula”
The Yamane formula:
was designed for finite population surveys, not for inferential testing in biomedical research. Applying it to hypothesis testing or estimating risk differences in clinical trials is a categorical error.
It ignores key parameters like:
Desired effect size
Variance in the population
Power to detect meaningful differences
3. Using “Incidence” Blindly
Another trap is using population incidence as the sole basis for sample size estimation. This may be fine if your study goal is descriptive (e.g., estimating disease prevalence), but it fails if your goal is comparative (e.g., testing an intervention).
Incidence tells you how common something is; it doesn’t tell you how many participants you need to detect a difference, reduce uncertainty, or make a decision.
🧠 Part II: Objective-Based Thinking — The True North of Sample Size
Dr. Phichayut’s core message is clear: sample size should be driven by your study objective, not by fixed rules or generic formulas.
There are two broad paths to consider:
A. Descriptive Objective: Precision Matters
Here, your aim is to describe something about the population:
Prevalence of necrotizing fasciitis
Mean blood pressure in a cohort
Distribution of complications in surgery
Your key statistical concern is precision. You want a narrow confidence interval around your estimate.
Clinical translation:
If you estimate that 25% of patients develop a complication, a sample of 30 may yield a 95% CI like: [2%, 60%]. That’s uselessly wide. To tighten to [20%, 30%], you might need 10x the sample size.
🔁 Doubling your precision requires quadrupling your sample.
B. Analytic Objective: Power, Error, and Effect
For hypothesis-driven research, such as randomized controlled trials or cohort comparisons, your focus shifts. Now you’re trying to detect a statistically and clinically meaningful difference between groups.
This demands control over:
Type I error (α) — false positives
Type II error (β) — false negatives
Power (1−β) — the probability of detecting a true effect
You must define your:
Effect size: Minimum difference worth detecting
Standard deviation or variability
Group ratio: Equal (1:1) or unequal allocations
Example (comparing two means):
⚙️ Part III: The BRAVES Method — A Clinician’s Checklist
To operationalize this complexity, we introduce the BRAVES mnemonic:
Component | Role in Sample Size |
Beta (β) | Controls power; standard is 0.2 (for 80% power) |
Ratio | Group allocation (1:1 is optimal, but may vary) |
Alpha (α) | Controls false positives; convention is 0.05 |
Variability | Drives precision; taken from prior data |
Effect Size | Minimum meaningful difference; clinically grounded |
Software | R, G*Power, STATA, etc. |
Each of these parameters must be intelligently and transparently chosen, ideally with input from a statistician or epidemiologist.
💡 Part IV: Clinical Scenarios — Applying the Theory
Scenario 1: Estimating the Prevalence of Long COVID
Objective: Descriptive
Expected prevalence: 25%
Precision desired: ±5%
Required sample: ≈ 288 participants Using:
Scenario 2: Comparing Drug A vs. Drug B for HbA1c Reduction
Objective: Hypothesis testing
Expected difference: 1.0%
SD: 1.5%
Power: 80%, Alpha: 0.05
Required sample (per group): ≈ 36
Using:
📌 Final Thoughts: Beyond Numbers, Toward Meaning
Sample size is not just a statistical necessity—it’s a clinical decision-making scaffold. Underpowered studies waste time and resources, expose patients to risk without benefit, and muddy the literature. Overpowered studies may be unethical or unnecessary.
The right sample size:
Aligns with your objective
Justifies your hypothesis
Reflects your clinical context
Enables actionable, trustworthy conclusions
If you remember just one thing: Sample size is not a plug-in number—it’s a strategic decision embedded in every part of your study.
Comments