top of page

Sample Is Not Quite Simple: A Clinical Epidemiologist’s Guide to Meaningful Sample Size Estimation

🔍 Introduction: Why Sample Size Estimation Is Not a Plug-and-Play Process

One of the most misunderstood elements of clinical research is sample size estimation. It’s often treated as a bureaucratic requirement—something to justify to ethics committees or funding agencies. But in reality, the sample size is the engine of your study’s validity.

The idea that there’s a magical number—say 30 subjects per group—or that there’s a universal equation that fits all research questions (e.g., the "Yamane" formula) is appealing but dangerous. This oversimplifies a process that demands alignment with the study’s clinical objective, respect for statistical nuance, and most importantly, an understanding of what’s at stake for patients.


This guide unpacks why sample size is not quite simple and how you, as a clinical researcher, can master this domain for more rigorous and impactful studies.


🎯 Part I: Busting the Common Misconceptions

1. The “Magic Number” of 30

A remnant of central limit theorem teachings, the idea that a sample of 30 is "good enough" is widely misapplied. Yes, under certain assumptions (normal distribution, low variance), n=30 might approximate population parameters. But in clinical research, we often deal with:

  • Rare outcomes

  • Skewed variables (e.g., length of stay, biomarkers)

  • Small effect sizes

In these settings, n=30 per group is often dangerously underpowered, leading to false negatives or misleading conclusions.

2. Misuse of the “Yamane Formula”

The Yamane formula:


was designed for finite population surveys, not for inferential testing in biomedical research. Applying it to hypothesis testing or estimating risk differences in clinical trials is a categorical error.

It ignores key parameters like:

  • Desired effect size

  • Variance in the population

  • Power to detect meaningful differences

3. Using “Incidence” Blindly

Another trap is using population incidence as the sole basis for sample size estimation. This may be fine if your study goal is descriptive (e.g., estimating disease prevalence), but it fails if your goal is comparative (e.g., testing an intervention).

Incidence tells you how common something is; it doesn’t tell you how many participants you need to detect a difference, reduce uncertainty, or make a decision.

🧠 Part II: Objective-Based Thinking — The True North of Sample Size

Dr. Phichayut’s core message is clear: sample size should be driven by your study objective, not by fixed rules or generic formulas.

There are two broad paths to consider:

A. Descriptive Objective: Precision Matters

Here, your aim is to describe something about the population:

  • Prevalence of necrotizing fasciitis

  • Mean blood pressure in a cohort

  • Distribution of complications in surgery

Your key statistical concern is precision. You want a narrow confidence interval around your estimate.


Clinical translation:

If you estimate that 25% of patients develop a complication, a sample of 30 may yield a 95% CI like: [2%, 60%]. That’s uselessly wide. To tighten to [20%, 30%], you might need 10x the sample size.

🔁 Doubling your precision requires quadrupling your sample.

B. Analytic Objective: Power, Error, and Effect

For hypothesis-driven research, such as randomized controlled trials or cohort comparisons, your focus shifts. Now you’re trying to detect a statistically and clinically meaningful difference between groups.

This demands control over:

  • Type I error (α) — false positives

  • Type II error (β) — false negatives

  • Power (1−β) — the probability of detecting a true effect

You must define your:

  • Effect size: Minimum difference worth detecting

  • Standard deviation or variability

  • Group ratio: Equal (1:1) or unequal allocations

Example (comparing two means):



⚙️ Part III: The BRAVES Method — A Clinician’s Checklist

To operationalize this complexity, we introduce the BRAVES mnemonic:

Component

Role in Sample Size

Beta (β)

Controls power; standard is 0.2 (for 80% power)

Ratio

Group allocation (1:1 is optimal, but may vary)

Alpha (α)

Controls false positives; convention is 0.05

Variability

Drives precision; taken from prior data

Effect Size

Minimum meaningful difference; clinically grounded

Software

R, G*Power, STATA, etc.

Each of these parameters must be intelligently and transparently chosen, ideally with input from a statistician or epidemiologist.


💡 Part IV: Clinical Scenarios — Applying the Theory

Scenario 1: Estimating the Prevalence of Long COVID

  • Objective: Descriptive

  • Expected prevalence: 25%

  • Precision desired: ±5%

Required sample: ≈ 288 participants Using:


Scenario 2: Comparing Drug A vs. Drug B for HbA1c Reduction

  • Objective: Hypothesis testing

  • Expected difference: 1.0%

  • SD: 1.5%

  • Power: 80%, Alpha: 0.05

Required sample (per group): ≈ 36

Using:



📌 Final Thoughts: Beyond Numbers, Toward Meaning

Sample size is not just a statistical necessity—it’s a clinical decision-making scaffold. Underpowered studies waste time and resources, expose patients to risk without benefit, and muddy the literature. Overpowered studies may be unethical or unnecessary.

The right sample size:

  • Aligns with your objective

  • Justifies your hypothesis

  • Reflects your clinical context

  • Enables actionable, trustworthy conclusions

If you remember just one thing: Sample size is not a plug-in number—it’s a strategic decision embedded in every part of your study.

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page