Merge vs Append in Stata: The Clinical Data Join Guide

Mayta
Jun 16
2 min read

Stata’s merge and append are among the first data management tools you’ll use in real clinical research. But when should you use each? Let’s break it down with clinical scenarios and ready-to-use Stata code.

Why You Need to Know the Difference

Clinical research datasets are rarely analysis-ready. Labs, demographics, follow-ups, and outcomes usually arrive in different files. Combining them the wrong way leads to headaches—or worse, errors in results. The two classic tools:

merge: Add new variables (columns) to existing cases by matching on one or more key IDs
append: Add new cases (rows) below the existing ones—stacking datasets vertically

Quick-Reference Table

Use Case	merge	append
Add variables?	Yes (joins columns by key)	No
Add observations?	No	Yes (stacks by row)
Needs key variable?	Yes (e.g., patient_id)	No (columns must match)
Typical scenario	Link patients to labs	Add 2024 to 2023 patients

Merge: Joining by Key (Side-by-Side)

Scenario: You have patient demographics in one file and their lab results in another. Both share patient_id.

merge 1:1 patient_id using lab_results.dta

What happens? Stata matches each patient’s lab to their demographics. The merged dataset has all variables—demographics and labs—in the same row.

Pro Tip: Always check the _merge variable created automatically. It flags matched and unmatched cases.

Append: Stacking Rows (Top to Bottom)

Scenario: You want to combine all patients from 2023 and 2024 into a single dataset for analysis.

append using patients_2023.dta

What happens? Stata stacks all 2023 patients under the 2024 list. No keys needed—just matching variable names.

Pro Tip: If columns don’t match exactly, Stata fills missing variables with blanks for whichever file lacks them.

Clinical Example: Table Cheat-Sheet

Goal	Stata Command
Add lab data to each patient	merge 1:1 patient_id using labs.dta
Combine multiple years’ patients	append using 2023_patients.dta
Merge hospitals to patient data	merge m:1 hospital_id using hospitals.dta
Stack two hospital registries	append using hospital2.dta

Quick Visual

MERGE (side by side)
[ID|A]   +   [ID|B]   →   [ID|A|B]

APPEND (top to bottom)
[ID|A]
[ID|A]
   +
[ID|A]
[ID|A]
   =
[ID|A]
[ID|A]
[ID|A]
[ID|A]

Final Safety Check

After a merge:

tab _merge

After an append:

summarize

Wrap-Up

Use merge when you want to add new details for the same cases.
Use append when you want to analyze more cases together.

Both are essential for robust, reproducible clinical research with Stata.

Got a tricky data combination problem? Drop a sample scenario below or DM for custom code!