How to Use QUADAS-2 to Assess Bias in Diagnostic Accuracy Studies

Mayta
May 12
3 min read

Introduction

Diagnostic accuracy studies are essential for understanding whether a test can correctly distinguish between those with and without a condition. However, the methodological quality of such studies can vary widely—and poor design can significantly bias the results.

To address this, researchers use QUADAS-2, a structured tool developed to assess the risk of bias and applicability concerns in primary diagnostic accuracy studies. Unlike a checklist that produces a summary score, QUADAS-2 guides critical appraisal using a domain-based judgment system.

In this article, we’ll walk through the structure and application of QUADAS-2 in depth, complete with illustrative examples to reinforce your mastery of each domain.

🧩 The Four Phases of QUADAS-2 Assessment

Before the actual appraisal begins, QUADAS-2 involves a preparatory sequence:

1. Define the Review Question

A clear systematic review question should specify:

Patients/population
Index test(s)
Reference standard
Target condition
Intended use (diagnosis, triage, screening)

2. Tailor the Tool to the Review

Customize signaling questions for each domain based on the topic.
Define disease spectrum and thresholds where applicable.

3. Review the Study’s Flow Diagram

Ensure clarity on recruitment, testing sequence, and patient inclusion.
Construct a flow diagram if one is missing.

4. Judge Bias and Applicability

Rate each domain as Low, High, or Unclear for:
- Risk of Bias
- Applicability Concerns

🧱 The Four Domains of QUADAS-2

Each domain addresses a critical aspect of study design and execution.

🔍 Domain 1: Patient Selection

Risk of Bias:

Could the way patients were selected introduce bias?

Yes, if using case-control designs (especially if selecting extreme cases).
Yes, if excluding many eligible patients without a clear rationale.

Applicability Concern:

Do the included patients match those in your intended clinical setting?

Signaling Questions:

Was a consecutive or random sample used?
Was a case-control design avoided?
Were inappropriate exclusions avoided?

Clinical Example:

Evaluating a diagnostic test for early Alzheimer's disease using only patients from a neurology referral center excludes the broader spectrum seen in primary care—this introduces both selection and spectrum bias.

🧪 Domain 2: Index Test

Risk of Bias:

Could the conduct or interpretation of the index test introduce bias?

Yes, if the index test reader knew the reference result (review bias).
Yes, if the test threshold was chosen after data analysis (overfitting).

Applicability Concern:

Is the test technique and interpretation generalizable?

Signaling Questions:

Was the test interpreted blinded to the reference standard?
Was the positivity threshold pre-specified?

Clinical Example:

A radiologist assessing CT scans for pulmonary embolism should not know D-dimer results or clinical gestalt. If the threshold for “positive” is derived from ROC post hoc, accuracy is likely inflated.

🧬 Domain 3: Reference Standard

Risk of Bias:

Is the reference standard itself reliable in diagnosing the condition?

Bias may occur with imperfect gold standards (e.g., clinical diagnosis instead of biopsy).
Bias also arises if the reference test is interpreted with knowledge of the index test.

Applicability Concern:

Does the definition of the target condition match what your question needs?

Signaling Questions:

Is the reference likely to correctly classify the condition?
Was it interpreted blind to the index test?

Clinical Example:

Using physician discharge diagnosis to confirm pneumonia status introduces incorporation bias if the physician relied on the chest X-ray (the index test) to make the diagnosis.

🕓 Domain 4: Flow and Timing

Risk of Bias:

Could the timing and sequence of tests or patient inclusion bias the results?

Bias arises if not all patients receive both tests.
Long delays between index and reference test can alter disease status.

Signaling Questions:

Was the interval between tests appropriate?
Did all patients receive the same reference test?
Were all patients included in the analysis?

Clinical Example:

If patients with negative rapid troponins are not referred for angiography, this can result in partial verification bias—the diagnostic accuracy of troponin is then misrepresented.

⚖️ Interpreting Judgments

Each domain is rated:

Low risk: All signaling questions answered “yes”
High risk: One or more “no” answers
Unclear risk: Insufficient information

Applicability ratings focus on whether the test or population matches your clinical question. For example, a study of ultrasound in tertiary ICUs may not apply to primary care.

❗ What QUADAS-2 Does Not Do

It does not produce a summary score—because different domains affect bias differently.
It is not a substitute for understanding study design logic.
It should be used in conjunction with STARD (for reporting quality) and clinical judgment.

🧠 Key Takeaways

QUADAS-2 helps identify methodological weaknesses in diagnostic accuracy studies across four domains.
It is not a numeric score—but a structured, domain-based evaluation.
Common biases caught by QUADAS-2 include:
- Spectrum bias
- Partial verification bias
- Review bias
- Overfitting from post hoc thresholds
Each domain requires tailoring based on the clinical context and review objective.