Challenges in using RoB 2 - A worked example

TL;DR

RoB 2 assessments for remote alcohol/drug therapy trials were frequently constrained by heterogeneous outcomes and incomplete reporting, leading to mostly “high risk” or “some concerns” ratings.

Briefing Cornell Notes

Briefing

Risk-of-bias assessment under RoB 2 proved both time-consuming and technically demanding for a systematic review of remote therapies for alcohol and drug misuse—especially when trials reported multiple outcomes, used complex social-intervention designs, and left key methodological details unclear. Across 53 RoB 2 assessments covering 45 randomized controlled trials (with only one study judged low risk of bias), most evidence landed in “high risk” or “some concerns,” underscoring how difficult it is to apply standardized bias criteria to heterogeneous remote-treatment research.

The work focused on adults in OECD countries receiving remotely delivered interventions—online or by telephone or via smartphone apps—such as remote recovery support, remote talking therapy, and self-guided therapy. Comparators included in-person care, no intervention, or alternative remote interventions. Outcomes were quantitative measures of substance use (for example, days of use or relapse), and the review team had to handle wide variation in what trials measured and how they measured it.

A first major challenge was pre-specifying which outcome to assess. The included studies were heterogeneous and there was no consensus on core outcomes for substance-use treatments. To manage this, the team used an iterative four-step decision process: prioritize outcomes relevant to stakeholders (excluding areas like mental health or criminal justice involvement), select commonly reported substance-use outcomes across the dataset, favor the most robust measurement approach (toxicology or self-report verified by toxicology over self-report alone), and ensure comparability by considering measure type (dichotomous vs continuous), statistical approach, and measurement time points.

Second, assessing and reporting multiple outcomes created duplication and workload. RoB 2 requires outcome-specific judgments, but many domains remain unchanged across outcomes within a trial. The team streamlined the process by identifying RoB 2 items likely to vary across outcomes (e.g., missing outcome data, self-report measurement issues, and outcome-ascertainment differences) while treating stable domains more consistently. They also applied pragmatic thresholds for missing data (e.g., ≥95% participant data for continuous outcomes) and treated self-report bias differently depending on whether outcomes were monitored or verified (urine toxicology was treated as more objective than unverified self-report).

Third, interpreting statistics and trial methodology was difficult because RoB 2 expects statistical and methodological sophistication, yet many trials provided limited information. The team often had to triangulate across multiple records—protocols, registration documents, and linked reports—to infer whether participants or intervention deliverers were aware of allocation, how missing data were handled, and what models were used. When protocols were unavailable, assumptions were required; for example, fully automated interventions were treated as less prone to bias from delivery awareness.

Finally, navigating RoB 2 guidance itself was a barrier. The “cryptic” nature of the checklist prompted frequent back-and-forth to full guidance, training videos, and webinars. To reduce ambiguity, the team piloted and built a review-specific guidance document that translated RoB 2 items into concrete decision rules for their remote-therapy context—such as treating imputing missing outcomes as positive drug use as an acceptable correction when missingness was presumed nonrandom, and requiring biochemical verification (e.g., urine toxicology) before judging whether outcome assessment was influenced by knowledge of allocation.

Overall, the experience highlighted that applying RoB 2 to broad, complex remote-therapy reviews demands more examples, clearer scenarios for missing information and advanced methods, and consolidated, multi-layered guidance resources—so assessments remain consistent even when trial reporting is incomplete.

Cornell Notes

Applying RoB 2 to remote alcohol and drug therapies was difficult even for an experienced review team. Only one of 45 randomized controlled trials assessed across 53 RoB 2 judgments was rated low risk; the rest were “high risk” or “some concerns,” largely due to heterogeneity in outcomes, incomplete trial reporting, and complex intervention designs. The team used a structured, four-step approach to pre-specify which outcome to judge, prioritizing stakeholder-relevant outcomes and the most robust measurement (toxicology or toxicology-verified self-report). For trials with multiple outcomes, they streamlined RoB 2 domain handling to avoid duplication while applying pragmatic rules for missing data and differentiating self-report from urine toxicology. They also created review-specific guidance to translate RoB 2 items into concrete decisions when information was missing.

Why was pre-specifying the outcome such a challenge in this review, and how did the team decide which outcome to assess?

The included trials were heterogeneous and there was no consensus on core outcomes for substance-use treatment. The review team used an iterative four decision-point process: (1) select outcomes relevant to stakeholders while excluding domains like mental health or criminal justice involvement; (2) focus on substance-use outcomes that were commonly reported across the dataset; (3) choose the most robust measure when multiple eligible outcomes existed—prioritizing toxicology or self-report verified by toxicology over self-report alone; and (4) ensure comparability by considering measure type (dichotomous vs continuous), the statistical approach, and the outcome measurement time points.

How did the team handle RoB 2 assessments when trials reported multiple eligible outcomes?

Multiple outcomes increased workload because RoB 2 includes outcome-specific judgments. The team avoided unnecessary repetition by identifying RoB 2 items likely to change across outcomes while keeping domains that stayed constant across outcomes more consistent. They treated continuous and dichotomous outcomes differently for missing data (e.g., ≥95% participant data was considered sufficient for continuous outcomes; missingness was considered small when observed events greatly exceeded missing cases for dichotomous outcomes). They also treated self-report measurement as potentially biased when intervention monitoring could affect retrospective reporting, while urine toxicology was treated as more objective.

What made interpreting statistics and trial methodology difficult under RoB 2, and what did the team do when key documents were missing?

RoB 2 requires sophisticated statistical and methodological understanding, but many trials provided limited detail. The team consulted multiple records—protocols, trial registrations, and linked reports—to gather information about deviations, missing data handling, and statistical models. When protocols or linked records were unavailable, they had to make assumptions, including how delivery awareness might affect bias (e.g., if intervention delivery required human input, the trial was treated as open; if fully automated, bias from delivery awareness was considered unlikely).

How did the team translate RoB 2 guidance into review-specific rules for remote-therapy trials?

They built a review-specific guidance document that converted RoB 2 items into concrete decisions tailored to their context. Examples included: for missing outcome data, imputing missing outcomes as positive drug use was treated as an acceptable correction when missingness was presumed nonrandom (likely due to use); for measurement/ascertainment, if intervention involved monitoring drug use and the control arm did not, retrospective self-report could be biased; and for outcome assessment influenced by knowledge of allocation, they required biochemical verification such as urine toxicology—without verification, they answered “no information” for that judgment.

What was the main complaint about RoB 2 guidance navigation, and how did the team respond?

The checklist (“cryptic” sheet) lacked sufficient detail, forcing frequent consultation of the full guidance document plus training materials like YouTube videos and webinars. To reduce inconsistency, the team piloted the process, ran consensus-building exercises, maintained meticulous records of decisions, and consolidated guidance into a multi-layered resource that could be expanded with explanations and examples when needed.

Review Questions

When multiple substance-use outcomes were eligible in a trial, what criteria did the team use to select the most appropriate outcome for RoB 2 assessment?
Describe how missing outcome data was handled differently for continuous versus dichotomous outcomes in this review.
What assumptions did the team make about intervention delivery awareness, and why did those assumptions matter for RoB 2 judgments?

Key Points

1
RoB 2 assessments for remote alcohol/drug therapy trials were frequently constrained by heterogeneous outcomes and incomplete reporting, leading to mostly “high risk” or “some concerns” ratings.
2
A structured four-step process helped pre-specify which substance-use outcome to assess, prioritizing stakeholder relevance, common reporting, robust measurement (toxicology-verified outcomes), and comparability across measure types and time points.
3
Trials with multiple eligible outcomes required a streamlined approach to avoid duplicating RoB 2 domain work while still applying outcome-specific judgments where they truly differed.
4
Self-report outcomes were treated as more vulnerable to bias when intervention monitoring could influence retrospective reporting; urine toxicology was treated as more objective.
5
When protocols or linked records were missing, the team relied on documented assumptions—such as treating fully automated interventions as less prone to bias from delivery awareness.
6
Guidance navigation was a practical barrier, prompting the creation of review-specific, multi-layered decision rules and additional examples for missing information and complex interventions.

Highlights

Only one of 45 randomized controlled trials assessed across 53 RoB 2 judgments was rated low risk of bias; the rest were high risk or had some concerns.

Outcome selection followed a four-step framework that prioritized toxicology-verified measures and comparability across dichotomous/continuous formats and time points.

Self-report bias was treated as a key risk when intervention monitoring could shape retrospective reporting, while urine toxicology was considered more objective.

The team built review-specific guidance to translate RoB 2 items into concrete decisions, including how to handle missing data and when biochemical verification was required.

The checklist alone wasn’t detailed enough, so assessments depended on full guidance plus training materials and extensive consensus-building.

Topics

RoB 2
Risk of Bias
Remote Therapies
Outcome Selection
Missing Data

Mentioned

RoB 2
OECD
NIHR
CRAAN