← All posts

DEPTh Typing as Diagnosis: Clinical Interpretation of Database-Based Identification

Clinical Epidemiology ResearchUniqcret doctor knowledgesBioinformaticsData Analytics or StatisticsMethodology and Research Design

🧭 1. DEPTh Typing: This is a Diagnostic Challenge

So your challenge = “Given a biological isolate (bio sample), how do we determine what organism it is — using a database comparison with a score and a hit?” ➡ DEPTh type: Diagnostic 

The object of study = diagnostic accuracy of a computational or laboratory index test


🔬 2. Diagnostic Logic: From Query to Clinical Answer

The bioinformatics pipeline actually mirrors the diagnostic accuracy framework:

Diagnostic conceptBioinformatics equivalentExplanation
Index testDatabase matching algorithm (e.g., BLAST, MALDI-TOF, Kraken2, Bruker Biotyper)It “tests” what your isolate might be.
Reference standardVerified species ID (e.g., 16S rRNA sequencing, WGS reference)The “true disease” or ground truth.
Test result (Query → Hit → Score)Alignment or spectral match producing scoreIndicates how close your sample is to known reference profiles.
Decision thresholdScore cutoffDetermines whether the hit is “positive” or “negative” for a given organism.

This is precisely how we operationalize the diagnostic accuracy study design.


⚙️ 3. The Core Principle: Quantified Similarity = Diagnostic Evidence

Step 1. Query

Your biological material (DNA, protein spectrum, etc.) → converted into a digital “fingerprint”:

Step 2. Database

Reference library of known organisms’ signatures.E.g.,

Step 3. Matching → Hit

Algorithm aligns your query fingerprint with database entries and reports:

Step 4. Threshold → Diagnostic Call

Every platform defines a cutoff where “match = identification”:

This parallels diagnostic cutoffs (like sensitivity/specificity in lab tests).


📊 4. Scoring = Clinical Accuracy Metrics

After generating hits and scores for many samples, you can build a diagnostic accuracy study:

Clinical metricBioinformatic equivalent
Sensitivity (TP/(TP+FN))% of isolates correctly identified above threshold
Specificity (TN/(TN+FP))% of non-target isolates correctly rejected
AUROCPerformance of score cutoff
Likelihood ratios (LR+/LR–)Probability of true ID given score above/below cutoff

You can visualize this as a Receiver Operating Characteristic (ROC) curve, where “score” acts as the continuous diagnostic marker.


🧩 5. Etiologic & Epidemiologic Extension

Once identification is validated → we can move to etiologic inference (DEPTh = Etiology) :

Now your database match becomes a predictor variable in your causal model: Y = f(Organism identified by query | confounders) species or genotypes cause specific infections, resistance, or outcomes.


Insight

The “bio database score-hit logic” is a digital analog of a diagnostic accuracy test:

When we publish or validate such tools (e.g., MALDI-TOF, 16S classifiers, metagenomic ID), we must report according to STARD 2015 and evaluate bias via QUADAS-2.


Key Takeaways

Comments

No comments yet. Be the first to share your thoughts.

Sign in to comment

DEPTh Typing as Diagnosis: Clinical Interpretation of Database-Based Identification — Uniqcret