Merge vs Append in Stata: The Clinical Data Join Guide
- Mayta
- Jun 16
- 2 min read
Stata’s merge and append are among the first data management tools you’ll use in real clinical research. But when should you use each? Let’s break it down with clinical scenarios and ready-to-use Stata code.
Why You Need to Know the Difference
Clinical research datasets are rarely analysis-ready. Labs, demographics, follow-ups, and outcomes usually arrive in different files. Combining them the wrong way leads to headaches—or worse, errors in results. The two classic tools:
merge: Add new variables (columns) to existing cases by matching on one or more key IDs
append: Add new cases (rows) below the existing ones—stacking datasets vertically
Quick-Reference Table
Use Case | merge | append |
Add variables? | Yes (joins columns by key) | No |
Add observations? | No | Yes (stacks by row) |
Needs key variable? | Yes (e.g., patient_id) | No (columns must match) |
Typical scenario | Link patients to labs | Add 2024 to 2023 patients |
Merge: Joining by Key (Side-by-Side)
Scenario: You have patient demographics in one file and their lab results in another. Both share patient_id.
merge 1:1 patient_id using lab_results.dta
What happens? Stata matches each patient’s lab to their demographics. The merged dataset has all variables—demographics and labs—in the same row.
Pro Tip: Always check the _merge variable created automatically. It flags matched and unmatched cases.
Append: Stacking Rows (Top to Bottom)
Scenario: You want to combine all patients from 2023 and 2024 into a single dataset for analysis.
append using patients_2023.dta
What happens? Stata stacks all 2023 patients under the 2024 list. No keys needed—just matching variable names.
Pro Tip: If columns don’t match exactly, Stata fills missing variables with blanks for whichever file lacks them.
Clinical Example: Table Cheat-Sheet
Goal | Stata Command |
Add lab data to each patient | merge 1:1 patient_id using labs.dta |
Combine multiple years’ patients | append using 2023_patients.dta |
Merge hospitals to patient data | merge m:1 hospital_id using hospitals.dta |
Stack two hospital registries | append using hospital2.dta |
Quick Visual
MERGE (side by side)
[ID|A] + [ID|B] → [ID|A|B]
APPEND (top to bottom)
[ID|A]
[ID|A]
+
[ID|A]
[ID|A]
=
[ID|A]
[ID|A]
[ID|A]
[ID|A]
Final Safety Check
After a merge:
tab _merge
After an append:
summarize
Wrap-Up
Use merge when you want to add new details for the same cases.
Use append when you want to analyze more cases together.
Both are essential for robust, reproducible clinical research with Stata.
Got a tricky data combination problem? Drop a sample scenario below or DM for custom code!
Comments