← All posts

Merge vs Append in Stata: The Clinical Data Join Guide

Clinical Epidemiology ResearchUniqcret doctor knowledgesStata [Data Analytics]Data Analytics or Statistics

Stata’s merge and append are among the first data management tools you’ll use in real clinical research. But when should you use each? Let’s break it down with clinical scenarios and ready-to-use Stata code.

Why You Need to Know the Difference

Clinical research datasets are rarely analysis-ready. Labs, demographics, follow-ups, and outcomes usually arrive in different files. Combining them the wrong way leads to headaches—or worse, errors in results. The two classic tools:


Quick-Reference Table

Use Casemergeappend
Add variables?Yes (joins columns by key)No
Add observations?NoYes (stacks by row)
Needs key variable?Yes (e.g., patient_id)No (columns must match)
Typical scenarioLink patients to labsAdd 2024 to 2023 patients


Merge: Joining by Key (Side-by-Side)

Scenario: You have patient demographics in one file and their lab results in another. Both share patient_id.

merge 1:1 patient_id using lab_results.dta

What happens? Stata matches each patient’s lab to their demographics. The merged dataset has all variables—demographics and labs—in the same row.

Pro Tip: Always check the _merge variable created automatically. It flags matched and unmatched cases.


Append: Stacking Rows (Top to Bottom)

Scenario: You want to combine all patients from 2023 and 2024 into a single dataset for analysis.

append using patients_2023.dta

What happens? Stata stacks all 2023 patients under the 2024 list. No keys needed—just matching variable names.

Pro Tip: If columns don’t match exactly, Stata fills missing variables with blanks for whichever file lacks them.


Clinical Example: Table Cheat-Sheet

GoalStata Command
Add lab data to each patientmerge 1:1 patient_id using labs.dta
Combine multiple years’ patientsappend using 2023_patients.dta
Merge hospitals to patient datamerge m:1 hospital_id using hospitals.dta
Stack two hospital registriesappend using hospital2.dta


Quick Visual

MERGE (side by side)
[ID|A]   +   [ID|B]   →   [ID|A|B]

APPEND (top to bottom)
[ID|A]
[ID|A]
   +
[ID|A]
[ID|A]
   =
[ID|A]
[ID|A]
[ID|A]
[ID|A]


Final Safety Check

After a merge:

tab _merge

After an append:

summarize


Wrap-Up

Both are essential for robust, reproducible clinical research with Stata.

Got a tricky data combination problem? Drop a sample scenario below or DM for custom code!

Comments

No comments yet. Be the first to share your thoughts.

Sign in to comment