Step-by-Step Guide to Categorical Data and Effect Measures in Network Meta-Analysis (NMA)
- Mayta

- 20 minutes ago
- 5 min read
0) Frame the clinical question & endpoint
What it is Define your PICO/PICOT and the binary outcome (event vs no event), its direction (“good” or “bad”), and time window.
Why we do it Clear framing prevents downstream mixing of incomparable endpoints or time horizons, and anchors interpretation (e.g., OR < 1 means benefit when the outcome is adverse).
Core focus
PICO/PICOT scope and eligibility criteria
Exact binary endpoint definition across trials
Direction of benefit (which side of 1.0 is “better”)
Typical outputs
Protocolized question statement and eligibility table
Outcome dictionary (definitions, windows, handling of competing risks)
Pre‑specified primary effect measure (often OR) and secondary measures
In your biologics manuscript, the team predefined outcomes then selected effect measures per endpoint (ORs for OCS reduction, IRRs for exacerbations).
1) Choose the effect measure
What it is Select a single primary scale for synthesis: Odds Ratio (OR) is most common for binary meta/NMA; RR or RD may be secondary.
Why we do it Consistent scaling avoids incoherence and facilitates network modeling and ranking; ORs behave consistently across baseline risks.
Core focus
Primary: OR on the log scale (analysis happens on log(OR))
Secondary (optional): RR, RD for absolute impact/NNT
Typical outputs
Rationale for measure choice
Conversion rules (if some trials report different measures)
Back‑transformed pooled estimates for clinical reading
Your NMA examples pooled ORs on the log scale for binary/ordinal dose‑reduction outcomes before ranking.
2) Build analyzable contrasts from trial data
What it is Create study‑level contrasts (log(OR), SE[log(OR)]) from arm‑level counts (r, n) or use reported contrasts consistently.
Why we do it Contrast‑based data are the lingua franca for synthesis, handle multi‑arm trials correctly, and feed the network model.
Core focus
Arm‑level → contrast‑level transformation
Handling zero cells (continuity corrections or robust methods)
Multi‑arm correlation (avoid double‑counting shared controls)
Typical outputs
A “contrast sheet” listing each comparison’s log(OR), SE, treat1, treat2, study label
Example In the classic antihypertensive–diabetes NMA, a separate spreadsheet of 45 two‑by‑two contrasts (log ORs + SEs) was prepared specifically to feed the network model.
3) Fit the synthesis model (start with random‑effects)
What it is Combine study contrasts using a random‑effects model; for multiple treatments, fit a frequentist NMA (e.g., netmeta).
Why we do it Random‑effects acknowledges real‑world between‑study variability. NMA integrates direct + indirect evidence across a network, enabling all pairwise comparisons.
Core focus
Random‑effects variance (τ²) estimator (REML / Paule–Mandel)
Common reference (for presentation only; the network itself is reference‑free)
Model convergence and plausibility checks
Typical outputs
Pooled log(OR) estimates and 95% CIs per treatment vs reference
τ² estimate and model diagnostics
Forest plot vs reference (interpret on OR scale)
Example My team’s NMA used a frequentist random‑effects approach (R netmeta) and then produced pooled estimates and ranks.
4) Assess heterogeneity (within‑comparison variability)
What it is Quantify how much the true effects differ across studies that assess the same contrast.
Why we do it High heterogeneity weakens a single pooled summary and signals effect modification, design differences, or quality issues.
Core focus
Cochran’s Q, I² (%), and τ² on the log(OR) scale
Visual check: study‑level forest plot
Pre‑planned exploration of sources (population, dose, follow‑up, risk of bias)
Typical outputs
Q test p‑value; I² bands (≈25/50/75%); τ² magnitude
Narrative on likely drivers; plan for subgroup/meta‑regression if warranted
Heterogeneity in your biologics NMA was explicitly assessed using Cochran’s Q and I² before moving to network‑level checks.
5) Check transitivity & consistency (network validity)
What it is Transitivity = comparability of studies across treatment comparisons (distributions of effect modifiers). Consistency = agreement of direct and indirect evidence.
Why we do it NMA’s core promise is valid indirect inference; without transitivity and consistency, ranks/league tables are unreliable.
Core focus
Transitivity: compare effect‑modifier distributions across comparisons (biomarkers, disease severity, background therapy)
Consistency:
Global (design‑by‑treatment test / model incoherence)
Local (node‑splitting: direct vs indirect for a given pair)
Typical outputs
Transitivity table/figure
Global test/incoherence metric and p‑value; node‑split estimates and p‑values
Action if violated (stratified networks, meta‑regression, or cautious interpretation)
Example if biologics paper prespecified transitivity, tested global and node‑level consistency, and used a comparison‑adjusted funnel when appropriate. In the diabetes NMA, incoherence (ω) was very low (≈1.7×10⁻⁵), supporting internal agreement of the network model.
6) Rank treatments (SUCRA / P‑score) & visualize uncertainty (rankograms)
What it is Compute rank probabilities for each treatment (1st, 2nd, …, kth); summarize as SUCRA (0–1) or frequentist P‑scores; show the full rank distribution via rankograms.
Why we do it Clinicians need a synthesized hierarchy—but we must also show uncertainty, not just a single rank.
Core focus
Rank probabilities derived from the NMA estimates and their uncertainty
SUCRA = surface under the cumulative rank curve (higher = better rank)
Interpretation with effect sizes, not instead of them
Typical outputs
Table of SUCRA/P‑scores with CIs if available
Rankograms (bar/line) per treatment, plus cumulative rankograms
Your team explicitly computed rank probabilities and SUCRA, and presented rankograms as the visual counterpart.
7) Present comparative results (league table, network plot, forest vs reference)
What it is Translate the network into decision‑ready displays:
League table: all pairwise ORs with 95% CIs
Network plot: nodes (treatments) and edges (direct trials), node size ∝ n, edge width ∝ evidence
Forest vs reference: quick clinical read of each treatment vs chosen anchor
Why we do it Stakeholders must answer “A or B?” quickly, understand the evidence structure, and see where indirect evidence dominates.
Core focus
Clear directionality in the league (column vs row convention)
Ordering by SUCRA/P‑score (with a warning that ranks ≠ certainty)
Network connectivity and balance (star vs richly connected)
Typical outputs
League table arranged by rank
Network graph (node/edge‑weighted)
Forest plot vs reference (e.g., placebo or standard care)
Your biologics manuscript built league tables ordered by SUCRA and included a network graph with node/edge encodings; these are the standard outputs your professor emphasizes. The diabetes NMA also demonstrated stable rank ordering even when the reference was switched (from diuretic to placebo), a key interpretability point.
8) Assess small‑study effects / publication bias
What it is In networks (when k > 10), use a comparison‑adjusted funnel plot to assess asymmetry suggestive of small‑study effects/publication bias.
Why we do it Differential reporting or small‑study inflation can distort pooled effects and ranks.
Core focus
Visual asymmetry tests (caution with low k)
Narrative synthesis; consider study size, setting, and risk‑of‑bias domains
Typical outputs
Comparison‑adjusted funnel plot (and, if appropriate, a brief statistical test)
A reasoned statement on likely small‑study effects and their impact on conclusions
Your biologics NMA specified comparison‑adjusted funnels for networks with >10 studies—a prudent standard.
9) Contributions, sensitivity, and certainty of evidence
What it is Make the synthesis auditable and robust: show which direct comparisons contribute to which network estimates, probe robustness with sensitivity analyses, and appraise certainty (e.g., confidence in NMA/CINeMA domains).
Why we do it Stakeholders need to know who drives the estimates, how results change under perturbations, and the confidence they can place in the conclusions.
Core focus
Contribution matrix (evidence flow)
Sensitivity ladders (exclude high‑risk‑of‑bias trials; remove small studies; alternative τ²; population or biomarker strata)
Certainty/credibility across risk of bias, imprecision, inconsistency, indirectness, publication bias
Typical outputs
Contribution heatmap/table to target sensitivity checks
Sensitivity results (narrative + key re‑estimates)
Certainty summary (e.g., high/moderate/low/very low with reasons)
The diabetes NMA ran multiple one‑way sensitivities (removing specific trial types, reassigning drug classes) and found estimates were robust—this is exemplary practice. Your biologics paper applied a confidence in NMA framework to rate evidence across domains after the quantitative synthesis.
At‑a‑glance mapping to the professor’s pattern
Random‑effects frequentist NMA (netmeta) → pooled ORs/log(OR) and τ².
Heterogeneity quantified with Q, I², τ² (Step 4).
Consistency checked globally (design‑by‑treatment/incoherence) and locally (node‑splitting) (Step 5).
Ranking via SUCRA/P‑score with rankograms (Step 6).
Comparative displays: league table, network plot, forest vs reference (Step 7).
Small‑study effects: comparison‑adjusted funnel when k > 10 (Step 8).
Contribution & certainty: contribution matrix + confidence in NMA (Step 9).
Final note on interpretation
Always lead with effect sizes and their CIs, not ranks alone. Read ranks with consistency diagnostics, heterogeneity, and certainty judgments; your own examples model this restraint well.





Comments