top of page

Diagnostic Indices: Sensitivity, Specificity, Predictive Values, and Beyond

Introduction

Every clinical diagnosis involves uncertainty. Diagnostic tests are tools that help us reduce that uncertainty by providing evidence about the likelihood of a condition. But how do we know if a diagnostic test is any good? That’s where diagnostic indices come in.

This article explains how we quantitatively evaluate a test’s diagnostic ability. It explores the foundational 2×2 contingency table, then dives into sensitivity, specificity, predictive values, likelihood ratios, and measures of agreement—finishing with a nuanced discussion on cutoffs and ROC curves.


🧮 1. The 2×2 Table: The Cornerstone of Diagnostic Evaluation

Before calculating any index, we build a 2×2 contingency table that compares the index test against a gold/reference standard.


Disease Present

Disease Absent

Test Positive

True Positive (a)

False Positive (b)

Test Negative

False Negative (c)

True Negative (d)

From this table, we derive nearly all diagnostic metrics.

Example: Evaluating a new urine dipstick test for detecting urinary tract infections (UTIs).


🧬 2. Nosologic Indices: Vertical Thinking

These indices start from the disease status and look at how well the test performs in identifying it.

Sensitivity (True Positive Rate)

Proportion of people with the disease who test positive.

Example: A test with 95% sensitivity for UTI will detect 95 out of 100 patients with actual infections.

High sensitivity = Good for ruling OUT disease Mnemonic: SeNsitivity → SNOUT

Specificity (True Negative Rate)

Proportion of people without the disease who test negative.


Example: A urine test with 90% specificity means 90 out of 100 healthy individuals will test negative.

High specificity = Good for ruling IN diseaseMnemonic: SPecificity → SPIN

🎯 3. Predictive Indices: Horizontal Thinking

These reflect real-world decision-making: Given a test result, what is the chance the patient actually has (or doesn’t have) the disease?

Positive Predictive Value (PPV)

Proportion of positive test results that are true positives.

Example: In a walk-in clinic with a high prevalence of UTI, a positive urine test may have a PPV of 80%.

Negative Predictive Value (NPV)

Proportion of negative test results that are true negatives.

Example: A low-prevalence setting (e.g., general population screening) often yields high NPV, even if sensitivity is moderate.

Key Insight: PPV and NPV depend heavily on disease prevalence.

📊 4. Influence of Prevalence

  • High prevalence → Increases PPV, lowers NPV.

  • Low prevalence → Increases NPV, lowers PPV.

Example: In a nursing home during a flu outbreak, the PPV of a rapid flu test skyrockets—but in a summer screening program, it drops.


📉 5. Likelihood Ratios: Bridging Pre- and Post-Test Probabilities

Likelihood ratios (LR) integrate both sensitivity and specificity into a single metric.

Positive Likelihood Ratio (LR+)

Tells you how much more likely a positive test is in a patient with disease vs. without.

Negative Likelihood Ratio (LR−)

Tells you how much less likely a negative test is in a patient with disease.

Mapping to the 2×2 Table


Disease Present (D⁺)

Disease Absent (D⁻)

Total

Test Positive

True Positive (a)

False Positive (b)

a + b

Test Negative

False Negative (c)

True Negative (d)

c + d

Total

a + c

b + d

N


Interpretation Guidelines

LR+

Diagnostic Utility

>10

Strongly rules in disease

2–5

Weak evidence

~1

No diagnostic value

LR−

Diagnostic Utility

<0.1

Strongly rules out disease

0.2–0.5

Weak evidence

Example: A malaria rapid test with LR+ = 12 and LR− = 0.08 is excellent for both ruling in and out disease.




🧠 6. Using LR with Pre-test Probability (Fagan Nomogram)

By applying a pre-test probability and LR, we can update to a post-test probability using:

  • Bayes’ theorem (in odds form), or

  • Fagan nomogram (graphical tool).

Example: A patient has 30% pre-test probability for strep throat. The test has LR+ = 5. After testing positive, post-test probability rises to ~70%.

🔬 7. ROC Curves and Cutoffs

When tests yield continuous values (e.g., blood glucose), we must define a cutoff.

Receiver Operating Characteristic (ROC) Curve

  • Plots sensitivity vs. 1-specificity across all cutoffs.

  • Area Under Curve (AUC/AuROC) summarizes performance.

AUC Range

Interpretation

0.5

No discrimination

0.7–0.8

Acceptable

0.8–0.9

Excellent

>0.9

Outstanding

Example: Evaluating CRP levels to predict bacterial vs. viral infection—optimal cutoff is where sensitivity and specificity are jointly maximized.

🤝 8. Agreement Measures (Rater or Method Comparison)

Kappa Statistic

Used for categorical agreement between raters or tests (e.g., radiologist vs. AI tool).

Kappa Value

Interpretation

<0.2

Poor

0.2–0.4

Fair

0.4–0.6

Moderate

0.6–0.8

Good

>0.8

Very good

Bland-Altman Method

Used for continuous data, to assess if two methods agree numerically, not just correlate.

  • Plots mean vs. difference.

  • Acceptable if most values fall within “limits of agreement” (mean ± 1.96 SD).

Example: Comparing home BP monitor readings with clinic sphygmomanometer.

Reminder: Correlation ≠ Agreement!

✅ Key Takeaways

  • Sensitivity and specificity describe test performance from the disease-out perspective.

  • Predictive values reflect clinical decisions based on test outcomes and are prevalence-sensitive.

  • Likelihood ratios allow dynamic revision of disease probability and are the most robust in Bayesian thinking.

  • Cutoff selection affects sensitivity/specificity trade-offs; ROC curves guide optimal thresholds.

  • Agreement measures (Kappa, Bland-Altman) ensure reliability in repeated or alternative testing.

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page