Diagnostic Indices: Sensitivity, Specificity, Predictive Values, and Beyond

Mayta
May 12
4 min read

Introduction

Every clinical diagnosis involves uncertainty. Diagnostic tests are tools that help us reduce that uncertainty by providing evidence about the likelihood of a condition. But how do we know if a diagnostic test is any good? That’s where diagnostic indices come in.

This article explains how we quantitatively evaluate a test’s diagnostic ability. It explores the foundational 2×2 contingency table, then dives into sensitivity, specificity, predictive values, likelihood ratios, and measures of agreement—finishing with a nuanced discussion on cutoffs and ROC curves.

🧮 1. The 2×2 Table: The Cornerstone of Diagnostic Evaluation

Before calculating any index, we build a 2×2 contingency table that compares the index test against a gold/reference standard.

	Disease Present	Disease Absent
Test Positive	True Positive (a)	False Positive (b)
Test Negative	False Negative (c)	True Negative (d)

From this table, we derive nearly all diagnostic metrics.

Example: Evaluating a new urine dipstick test for detecting urinary tract infections (UTIs).

🧬 2. Nosologic Indices: Vertical Thinking

These indices start from the disease status and look at how well the test performs in identifying it.

Sensitivity (True Positive Rate)

Proportion of people with the disease who test positive.

Example: A test with 95% sensitivity for UTI will detect 95 out of 100 patients with actual infections.

High sensitivity = Good for ruling OUT disease Mnemonic: SeNsitivity → SNOUT

Specificity (True Negative Rate)

Proportion of people without the disease who test negative.

Example: A urine test with 90% specificity means 90 out of 100 healthy individuals will test negative.

High specificity = Good for ruling IN diseaseMnemonic: SPecificity → SPIN

🎯 3. Predictive Indices: Horizontal Thinking

These reflect real-world decision-making: Given a test result, what is the chance the patient actually has (or doesn’t have) the disease?

Positive Predictive Value (PPV)

Proportion of positive test results that are true positives.

Example: In a walk-in clinic with a high prevalence of UTI, a positive urine test may have a PPV of 80%.

Negative Predictive Value (NPV)

Proportion of negative test results that are true negatives.

Example: A low-prevalence setting (e.g., general population screening) often yields high NPV, even if sensitivity is moderate.

Key Insight: PPV and NPV depend heavily on disease prevalence.

📊 4. Influence of Prevalence

High prevalence → Increases PPV, lowers NPV.
Low prevalence → Increases NPV, lowers PPV.

Example: In a nursing home during a flu outbreak, the PPV of a rapid flu test skyrockets—but in a summer screening program, it drops.

📉 5. Likelihood Ratios: Bridging Pre- and Post-Test Probabilities

Likelihood ratios (LR) integrate both sensitivity and specificity into a single metric.

Positive Likelihood Ratio (LR+)

Tells you how much more likely a positive test is in a patient with disease vs. without.

Negative Likelihood Ratio (LR−)

Tells you how much less likely a negative test is in a patient with disease.

Mapping to the 2×2 Table

	Disease Present (D⁺)	Disease Absent (D⁻)	Total
Test Positive	True Positive (a)	False Positive (b)	a + b
Test Negative	False Negative (c)	True Negative (d)	c + d
Total	a + c	b + d	N

Interpretation Guidelines

LR+	Diagnostic Utility
>10	Strongly rules in disease
2–5	Weak evidence
~1	No diagnostic value

LR−	Diagnostic Utility
<0.1	Strongly rules out disease
0.2–0.5	Weak evidence

Example: A malaria rapid test with LR+ = 12 and LR− = 0.08 is excellent for both ruling in and out disease.

🧠 6. Using LR with Pre-test Probability (Fagan Nomogram)

By applying a pre-test probability and LR, we can update to a post-test probability using:

Bayes’ theorem (in odds form), or
Fagan nomogram (graphical tool).

Example: A patient has 30% pre-test probability for strep throat. The test has LR+ = 5. After testing positive, post-test probability rises to ~70%.

🔬 7. ROC Curves and Cutoffs

When tests yield continuous values (e.g., blood glucose), we must define a cutoff.

Receiver Operating Characteristic (ROC) Curve

Plots sensitivity vs. 1-specificity across all cutoffs.
Area Under Curve (AUC/AuROC) summarizes performance.

AUC Range	Interpretation
0.5	No discrimination
0.7–0.8	Acceptable
0.8–0.9	Excellent
>0.9	Outstanding

Example: Evaluating CRP levels to predict bacterial vs. viral infection—optimal cutoff is where sensitivity and specificity are jointly maximized.

🤝 8. Agreement Measures (Rater or Method Comparison)

Kappa Statistic

Used for categorical agreement between raters or tests (e.g., radiologist vs. AI tool).

Kappa Value	Interpretation
<0.2	Poor
0.2–0.4	Fair
0.4–0.6	Moderate
0.6–0.8	Good
>0.8	Very good

Bland-Altman Method

Used for continuous data, to assess if two methods agree numerically, not just correlate.

Plots mean vs. difference.
Acceptable if most values fall within “limits of agreement” (mean ± 1.96 SD).

Example: Comparing home BP monitor readings with clinic sphygmomanometer.

Reminder: Correlation ≠ Agreement!

✅ Key Takeaways

Sensitivity and specificity describe test performance from the disease-out perspective.
Predictive values reflect clinical decisions based on test outcomes and are prevalence-sensitive.
Likelihood ratios allow dynamic revision of disease probability and are the most robust in Bayesian thinking.
Cutoff selection affects sensitivity/specificity trade-offs; ROC curves guide optimal thresholds.
Agreement measures (Kappa, Bland-Altman) ensure reliability in repeated or alternative testing.