Finding clarity in complexity: an introduction to the concept of network meta-analysis

TL;DR

Network meta-analysis combines direct head-to-head trial evidence with indirect comparisons across a treatment network to estimate relative effects for multiple interventions in one analysis.

Briefing Cornell Notes

Briefing

Network meta-analysis (NMA) is a method for comparing three or more treatments in one analysis by combining direct head-to-head trial evidence with indirect comparisons drawn through a shared comparator. That matters because clinicians and patients often face “which treatment should I choose?” questions, yet the evidence base rarely contains a single trial that compares every option. NMA builds a connected evidence network, allowing relative effectiveness estimates and treatment rankings even when some pairings were never tested directly.

The presentation starts by grounding NMA in the familiar hierarchy of evidence. A systematic review collects studies that meet prespecified eligibility criteria using explicit methods to minimize bias. Pairwise meta-analysis then statistically combines results from studies that compare two interventions (for example, drug A versus drug B). NMA extends this logic: if trials compare A versus B and B versus C, the method can infer an indirect A versus C comparison through B, while also incorporating any available direct A versus C trials. The result is a “mixed” treatment effect estimate that reflects both evidence pathways.

The core requirement for credible indirect comparisons is the transitivity assumption: the different sets of randomized trials must be similar on average in all important factors other than the interventions being compared. The talk illustrates how transitivity can fail when effect modifiers differ across comparisons—for instance, when placebo is administered in different forms (injection versus pill) or when participant age ranges vary substantially between trial sets. A related concept is coherence (also called consistency), which checks whether direct and indirect estimates agree statistically. Coherence can be assessed globally across the whole network or locally within specific loops, helping identify “hot spots” where assumptions may break down.

Because NMA adds complexity, the protocol and analysis plan must be more explicit than in a standard pairwise review. The protocol should justify why NMA is needed, state whether ranking is an objective, define inclusion/exclusion criteria with transitivity in mind, and specify how interventions will be handled (including doses, administration routes, and whether to lump or split network nodes). Search strategy also needs care: relying on existing systematic reviews is usually inadequate for capturing the full network. Data analysis sections should name the statistical framework (Bayesian or frequentist; fixed or random effects) and address multi-arm trial correlation and heterogeneity assumptions. If Bayesian methods are used, prior distributions and Markov chain Monte Carlo convergence details should be reported.

Quality and reporting considerations receive sustained attention. The talk notes an extension to PRISMA for NMA reporting and highlights that many published NMA protocols underreport key elements such as transitivity and model choice. For presenting results, it recommends concise formats—league tables, direct/indirect/network estimates, and clear handling of uncertainty—often moving nonessential figures to online repositories. Treatment ranking is possible using metrics such as SUCRA (surface under the cumulative ranking curve), but ranking should not replace relative effect estimates and must match the clinical question and appropriate ranking metric.

A worked example centers on uterotonics for preventing postpartum hemorrhage, including network diagrams, league-table-style results, coherence checks, and summary-of-findings tables structured by outcome rather than by comparison. The talk also addresses what to do when NMA becomes infeasible: keep the protocol, transparently report methods, and assess risk of bias so the review can be updated later if the evidence base grows. Finally, it points to key resources, including the Cochrane Handbook chapter on NMA, Cochrane training modules (protocol considerations, GRADE, CINeMA, and more), and software options such as WinBUGS, JAGS, NMAstudio, and web tools like meta insight.

Cornell Notes

Network meta-analysis extends pairwise meta-analysis to compare three or more interventions in one framework by combining direct and indirect evidence across a connected network. Its credibility hinges on transitivity (trial sets must be similar in important effect modifiers other than the interventions) and on coherence (direct and indirect estimates should agree). Because NMA can involve many comparisons, protocols must clearly justify the need for NMA, define eligible interventions (including how to handle doses and administration), and specify the statistical model (Bayesian vs frequentist; fixed vs random; multi-arm correlation; heterogeneity; and, for Bayesian approaches, priors and MCMC convergence). Results should be presented accessibly using tools like league tables and outcome-focused summary-of-findings tables, while treatment ranking (e.g., SUCRA) must be tied to a specific clinical ranking question and never substitute for relative effect estimates. If NMA is not feasible, the review should still follow the pre-published protocol and report methods transparently for future updates.

How does NMA produce an estimate for a treatment pair that was never directly compared in trials?

NMA uses the network structure. If one set of trials compares A vs B and another compares B vs C, the method can infer an indirect A vs C comparison through B. If head-to-head A vs C trials also exist, NMA combines direct and indirect evidence to produce a single mixed (combined) treatment effect estimate. This is why NMA is useful when the evidence base lacks a single trial comparing all options.

What is transitivity, and what kinds of differences can break it?

Transitivity requires that the different randomized trial sets are similar on average in all important factors other than the interventions being compared. The talk gives examples: if a “placebo” appears in different forms across trials (e.g., injection vs pill), the populations and contexts may differ in ways that affect outcomes, undermining transitivity. Another example is when participant age ranges differ substantially between the placebo-vs-B and placebo-vs-C comparisons; if effect modifiers like age are not balanced, indirect comparisons may be invalid.

What is coherence (consistency), and how is it assessed?

Coherence checks whether direct and indirect estimates agree statistically. The talk distinguishes global approaches (assessing the whole network using models such as design-by-treatment interaction) from local approaches that examine specific loops to find “hot spots” of inconsistency. It also stresses that methods and results sections should align—if local checks are used in results, they should be described in methods.

Why must NMA protocols be more detailed than pairwise meta-analysis protocols?

NMA adds extra assumptions and modeling choices. Protocols should explicitly state that NMA will be conducted and whether ranking is planned, justify why NMA is needed, and define inclusion/exclusion criteria with transitivity in mind. They must also explain intervention selection and how to handle doses and administration routes (including whether to lump or split nodes), specify the statistical framework (Bayesian vs frequentist; fixed vs random), and address multi-arm trial correlation and heterogeneity assumptions. The talk notes that many published NMA protocols still underreport critical items like transitivity and model choice.

How should treatment ranking be handled so it doesn’t mislead?

Ranking is optional, but it must match the clinical ranking question and use an appropriate metric. The talk warns that ranking measures are not substitutes for relative effect estimates and that ranking can be problematic when uncertainty is high or differs across comparisons. In the uterotonics example, SUCRA (surface under the cumulative ranking curve) is used to reflect which treatment has the largest fraction of competitors it beats, but the review still reports relative effects and uncertainty alongside rankings.

What can reviewers do if an NMA is planned but becomes infeasible?

The recommendation is to keep the pre-published protocol and continue to report methods transparently, including risk-of-bias assessment, even if the network meta-analysis cannot be performed due to insufficient studies or uncertainty about transitivity. The review can then be updated later if new evidence makes NMA feasible.

Review Questions

What conditions must hold for indirect comparisons in NMA to be considered valid, and how do transitivity and coherence differ?
If a network includes multi-arm trials, what modeling considerations must be specified in the NMA analysis plan?
When using SUCRA or other ranking metrics, what additional information must be provided to ensure ranking answers a meaningful clinical question?

Key Points

1
Network meta-analysis combines direct head-to-head trial evidence with indirect comparisons across a treatment network to estimate relative effects for multiple interventions in one analysis.
2
Credible indirect comparisons require transitivity: trial sets must be similar on average in key effect modifiers other than the interventions being compared.
3
Coherence (consistency) tests whether direct and indirect estimates agree, using global and/or local approaches to identify inconsistency within the network.
4
NMA protocols should explicitly justify the need for NMA, define interventions (including dose/route decisions and node lumping/splitting), and pre-specify the statistical framework and assumptions.
5
Bayesian NMA reporting should include prior distributions and details on MCMC convergence; multi-arm trial correlation and heterogeneity assumptions must be addressed.
6
Treatment ranking (e.g., SUCRA) should be tied to a specific ranking question and never replace reporting of relative effect estimates and uncertainty.
7
If NMA becomes infeasible, the review should still follow the protocol, assess risk of bias, and transparently report methods for future updates.

Highlights

NMA can estimate A vs C even when no A–C trials exist, by using A–B and B–C evidence and combining direct and indirect information when available.

Transitivity can fail when important effect modifiers differ across trial sets—such as placebo being delivered in different forms or participant age ranges shifting between comparisons.

Coherence checks whether direct and indirect estimates agree, and it can be assessed globally across the network or locally within loops to pinpoint inconsistency.

Ranking treatments with SUCRA is useful but must match the clinical ranking question and should not substitute for relative effect estimates.

When NMA can’t be done, keeping the protocol and reporting methods transparently preserves value for future evidence updates.

Topics

Network Meta-Analysis
Transitivity
Coherence
Treatment Ranking
Protocol Template

Mentioned

Kerry Dwan
Donna
NMA
PICO
NIHR
WHO
GRADE
CINeMA
SUCRA
MCMC