Diagnostic Indices: Sensitivity, Specificity, Predictive Values, and Beyond
- Mayta
- May 12
- 4 min read
Introduction
Every clinical diagnosis involves uncertainty. Diagnostic tests are tools that help us reduce that uncertainty by providing evidence about the likelihood of a condition. But how do we know if a diagnostic test is any good? That’s where diagnostic indices come in.
This article explains how we quantitatively evaluate a test’s diagnostic ability. It explores the foundational 2×2 contingency table, then dives into sensitivity, specificity, predictive values, likelihood ratios, and measures of agreement—finishing with a nuanced discussion on cutoffs and ROC curves.
🧮 1. The 2×2 Table: The Cornerstone of Diagnostic Evaluation
Before calculating any index, we build a 2×2 contingency table that compares the index test against a gold/reference standard.
Disease Present | Disease Absent | |
Test Positive | True Positive (a) | False Positive (b) |
Test Negative | False Negative (c) | True Negative (d) |
From this table, we derive nearly all diagnostic metrics.
Example: Evaluating a new urine dipstick test for detecting urinary tract infections (UTIs).
🧬 2. Nosologic Indices: Vertical Thinking
These indices start from the disease status and look at how well the test performs in identifying it.
Sensitivity (True Positive Rate)
Proportion of people with the disease who test positive.
Example: A test with 95% sensitivity for UTI will detect 95 out of 100 patients with actual infections.
High sensitivity = Good for ruling OUT disease Mnemonic: SeNsitivity → SNOUT
Specificity (True Negative Rate)
Proportion of people without the disease who test negative.
Example: A urine test with 90% specificity means 90 out of 100 healthy individuals will test negative.
High specificity = Good for ruling IN diseaseMnemonic: SPecificity → SPIN
🎯 3. Predictive Indices: Horizontal Thinking
These reflect real-world decision-making: Given a test result, what is the chance the patient actually has (or doesn’t have) the disease?
Positive Predictive Value (PPV)
Proportion of positive test results that are true positives.
Example: In a walk-in clinic with a high prevalence of UTI, a positive urine test may have a PPV of 80%.
Negative Predictive Value (NPV)
Proportion of negative test results that are true negatives.
Example: A low-prevalence setting (e.g., general population screening) often yields high NPV, even if sensitivity is moderate.
Key Insight: PPV and NPV depend heavily on disease prevalence.
📊 4. Influence of Prevalence
High prevalence → Increases PPV, lowers NPV.
Low prevalence → Increases NPV, lowers PPV.
Example: In a nursing home during a flu outbreak, the PPV of a rapid flu test skyrockets—but in a summer screening program, it drops.
📉 5. Likelihood Ratios: Bridging Pre- and Post-Test Probabilities
Likelihood ratios (LR) integrate both sensitivity and specificity into a single metric.
Positive Likelihood Ratio (LR+)
Tells you how much more likely a positive test is in a patient with disease vs. without.
Negative Likelihood Ratio (LR−)
Tells you how much less likely a negative test is in a patient with disease.
Mapping to the 2×2 Table
Disease Present (D⁺) | Disease Absent (D⁻) | Total | |
Test Positive | True Positive (a) | False Positive (b) | a + b |
Test Negative | False Negative (c) | True Negative (d) | c + d |
Total | a + c | b + d | N |
Interpretation Guidelines
LR+ | Diagnostic Utility |
>10 | Strongly rules in disease |
2–5 | Weak evidence |
~1 | No diagnostic value |
LR− | Diagnostic Utility |
<0.1 | Strongly rules out disease |
0.2–0.5 | Weak evidence |
Example: A malaria rapid test with LR+ = 12 and LR− = 0.08 is excellent for both ruling in and out disease.
🧠 6. Using LR with Pre-test Probability (Fagan Nomogram)
By applying a pre-test probability and LR, we can update to a post-test probability using:
Bayes’ theorem (in odds form), or
Fagan nomogram (graphical tool).
Example: A patient has 30% pre-test probability for strep throat. The test has LR+ = 5. After testing positive, post-test probability rises to ~70%.
🔬 7. ROC Curves and Cutoffs
When tests yield continuous values (e.g., blood glucose), we must define a cutoff.
Receiver Operating Characteristic (ROC) Curve
Plots sensitivity vs. 1-specificity across all cutoffs.
Area Under Curve (AUC/AuROC) summarizes performance.
AUC Range | Interpretation |
0.5 | No discrimination |
0.7–0.8 | Acceptable |
0.8–0.9 | Excellent |
>0.9 | Outstanding |
Example: Evaluating CRP levels to predict bacterial vs. viral infection—optimal cutoff is where sensitivity and specificity are jointly maximized.
🤝 8. Agreement Measures
(Rater or Method Comparison)
Kappa Statistic
Used for categorical agreement between raters or tests (e.g., radiologist vs. AI tool).
Kappa Value | Interpretation |
<0.2 | Poor |
0.2–0.4 | Fair |
0.4–0.6 | Moderate |
0.6–0.8 | Good |
>0.8 | Very good |
Bland-Altman Method
Used for continuous data, to assess if two methods agree numerically, not just correlate.
Plots mean vs. difference.
Acceptable if most values fall within “limits of agreement” (mean ± 1.96 SD).
Example: Comparing home BP monitor readings with clinic sphygmomanometer.
Reminder: Correlation ≠ Agreement!
✅ Key Takeaways
Sensitivity and specificity describe test performance from the disease-out perspective.
Predictive values reflect clinical decisions based on test outcomes and are prevalence-sensitive.
Likelihood ratios allow dynamic revision of disease probability and are the most robust in Bayesian thinking.
Cutoff selection affects sensitivity/specificity trade-offs; ROC curves guide optimal thresholds.
Agreement measures (Kappa, Bland-Altman) ensure reliability in repeated or alternative testing.
Comments