top of page

Merge vs Append in Stata: The Clinical Data Join Guide

Stata’s merge and append are among the first data management tools you’ll use in real clinical research. But when should you use each? Let’s break it down with clinical scenarios and ready-to-use Stata code.

Why You Need to Know the Difference

Clinical research datasets are rarely analysis-ready. Labs, demographics, follow-ups, and outcomes usually arrive in different files. Combining them the wrong way leads to headaches—or worse, errors in results. The two classic tools:

  • merge: Add new variables (columns) to existing cases by matching on one or more key IDs

  • append: Add new cases (rows) below the existing ones—stacking datasets vertically

Quick-Reference Table

Use Case

merge

append

Add variables?

Yes (joins columns by key)

No

Add observations?

No

Yes (stacks by row)

Needs key variable?

Yes (e.g., patient_id)

No (columns must match)

Typical scenario

Link patients to labs

Add 2024 to 2023 patients


Merge: Joining by Key (Side-by-Side)

Scenario: You have patient demographics in one file and their lab results in another. Both share patient_id.

merge 1:1 patient_id using lab_results.dta

What happens? Stata matches each patient’s lab to their demographics. The merged dataset has all variables—demographics and labs—in the same row.

Pro Tip: Always check the _merge variable created automatically. It flags matched and unmatched cases.


Append: Stacking Rows (Top to Bottom)

Scenario: You want to combine all patients from 2023 and 2024 into a single dataset for analysis.

append using patients_2023.dta

What happens? Stata stacks all 2023 patients under the 2024 list. No keys needed—just matching variable names.

Pro Tip: If columns don’t match exactly, Stata fills missing variables with blanks for whichever file lacks them.


Clinical Example: Table Cheat-Sheet

Goal

Stata Command

Add lab data to each patient

merge 1:1 patient_id using labs.dta

Combine multiple years’ patients

append using 2023_patients.dta

Merge hospitals to patient data

merge m:1 hospital_id using hospitals.dta

Stack two hospital registries

append using hospital2.dta


Quick Visual

MERGE (side by side)
[ID|A]   +   [ID|B]   →   [ID|A|B]

APPEND (top to bottom)
[ID|A]
[ID|A]
   +
[ID|A]
[ID|A]
   =
[ID|A]
[ID|A]
[ID|A]
[ID|A]


Final Safety Check

After a merge:

tab _merge

After an append:

summarize


Wrap-Up

  • Use merge when you want to add new details for the same cases.

  • Use append when you want to analyze more cases together.

Both are essential for robust, reproducible clinical research with Stata.

Got a tricky data combination problem? Drop a sample scenario below or DM for custom code!

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page