← All posts

Sample Is Not Quite Simple: A Clinical Epidemiologist’s Guide to Meaningful Sample Size Estimation

Clinical Epidemiology ResearchUniqcret doctor knowledgesMethodology and Research Design

🔍 Introduction: Why Sample Size Estimation Is Not a Plug-and-Play Process

One of the most misunderstood elements of clinical research is sample size estimation. It’s often treated as a bureaucratic requirement—something to justify to ethics committees or funding agencies. But in reality, the sample size is the engine of your study’s validity.

The idea that there’s a magical number—say 30 subjects per group—or that there’s a universal equation that fits all research questions (e.g., the "Yamane" formula) is appealing but dangerous. This oversimplifies a process that demands alignment with the study’s clinical objective, respect for statistical nuance, and most importantly, an understanding of what’s at stake for patients.

This guide unpacks why sample size is not quite simple and how you, as a clinical researcher, can master this domain for more rigorous and impactful studies.


🎯 Part I: Busting the Common Misconceptions

1. The “Magic Number” of 30

A remnant of central limit theorem teachings, the idea that a sample of 30 is "good enough" is widely misapplied. Yes, under certain assumptions (normal distribution, low variance), n=30 might approximate population parameters. But in clinical research, we often deal with:

In these settings, n=30 per group is often dangerously underpowered, leading to false negatives or misleading conclusions.

2. Misuse of the “Yamane Formula”

The Yamane formula:

Yamane Formula for Sample Size

The formula is:

n = N 1 + N e 2

Where:

  • N = Total population
  • n = Sample size
  • e = Margin of error (sampling error factor)

The factor e determines the accuracy of the estimate. For example, e = 0.05 means you're allowing a 5% error in sampling.

was designed for finite population surveys, not for inferential testing in biomedical research. Applying it to hypothesis testing or estimating risk differences in clinical trials is a categorical error.

It ignores key parameters like:

3. Using “Incidence” Blindly

Another trap is using population incidence as the sole basis for sample size estimation. This may be fine if your study goal is descriptive (e.g., estimating disease prevalence), but it fails if your goal is comparative (e.g., testing an intervention).

Incidence tells you how common something is; it doesn’t tell you how many participants you need to detect a difference, reduce uncertainty, or make a decision.


🧠 Part II: Objective-Based Thinking — The True North of Sample Size

Dr. Phichayut’s core message is clear: sample size should be driven by your study objective, not by fixed rules or generic formulas.

There are two broad paths to consider:

A. Descriptive Objective: Precision Matters

Here, your aim is to describe something about the population:

Your key statistical concern is precision. You want a narrow confidence interval around your estimate.

Proportion-Based Sample Size Formula

The formula is:

n = Z 2 p ( 1 p ) d 2

Where:

  • Z = Z-score for confidence level (e.g., 1.96 for 95%)
  • p = Anticipated proportion (e.g., from past data or pilot study)
  • d = Margin of error (e.g., 0.05 for ±5%)

This formula is useful when estimating a sample size for population proportions, especially in survey research and public health studies.

Clinical translation:

If you estimate that 25% of patients develop a complication, a sample of 30 may yield a 95% CI like: [2%, 60%]. That’s uselessly wide. To tighten to [20%, 30%], you might need 10x the sample size.

🔁 Doubling your precision requires quadrupling your sample.

B. Analytic Objective: Power, Error, and Effect

For hypothesis-driven research, such as randomized controlled trials or cohort comparisons, your focus shifts. Now you’re trying to detect a statistically and clinically meaningful difference between groups.

This demands control over:

You must define your:

Example (comparing two means):

Sample Size Formula for Comparing Two Means

The formula is:

n = 2 ( Z 1 α 2 + Z 1 β ) 2 σ 2 δ 2

Where:

  • δ (delta) = Effect size (the difference in means you're trying to detect)
  • σ (sigma) = Common standard deviation of the two groups
  • Z1-α/2 = Z-score for the desired significance level (e.g., 1.96 for 5%)
  • Z1-β = Z-score for desired power (e.g., 0.84 for 80% power)

This formula is commonly used in clinical trials or experiments where the goal is to compare the means of two groups.


⚙️ Part III: The BRAVES Method — A Clinician’s Checklist

To operationalize this complexity, we introduce the BRAVES mnemonic:

ComponentRole in Sample Size
Beta (β)Controls power; standard is 0.2 (for 80% power)
RatioGroup allocation (1:1 is optimal, but may vary)
Alpha (α)Controls false positives; convention is 0.05
VariabilityDrives precision; taken from prior data
Effect SizeMinimum meaningful difference; clinically grounded
SoftwareR, G*Power, STATA, etc.

Each of these parameters must be intelligently and transparently chosen, ideally with input from a statistician or epidemiologist.


💡 Part IV: Clinical Scenarios — Applying the Theory

Scenario 1: Estimating the Prevalence of Long COVID

Required sample: ≈ 288 participants Using:

Example: Sample Size Calculation for Proportion

Using the proportion-based formula:

n = 1.96 2 0.25 ( 1 0.25 ) 0.05 2

This example calculates the sample size required to estimate a population proportion with:

  • Z = 1.96 (95% confidence)
  • p = 0.25 (estimated proportion)
  • d = 0.05 (margin of error)

Scenario 2: Comparing Drug A vs. Drug B for HbA1c Reduction

Required sample (per group): ≈ 36

Using:

Example: Sample Size for Comparing Two Means

Using the formula:

n = 2 ( 1.96 + 0.84 ) 2 1.5 2 1.0 2

This example computes the sample size with:

  • Z1-α/2 = 1.96 (for 95% confidence)
  • Z1-β = 0.84 (for 80% power)
  • σ = 1.5 (standard deviation)
  • δ = 1.0 (minimum detectable difference)

📌 Final Thoughts: Beyond Numbers, Toward Meaning

Sample size is not just a statistical necessity—it’s a clinical decision-making scaffold. Underpowered studies waste time and resources, expose patients to risk without benefit, and muddy the literature. Overpowered studies may be unethical or unnecessary.

The right sample size:

If you remember just one thing: Sample size is not a plug-in number—it’s a strategic decision embedded in every part of your study.