DEPTh Typing as Diagnosis: Clinical Interpretation of Database-Based Identification
- Mayta

- Oct 21
- 3 min read
🧭 1. DEPTh Typing: This is a Diagnostic Challenge
So your challenge = “Given a biological isolate (bio sample), how do we determine what organism it is — using a database comparison with a score and a hit?” ➡ DEPTh type: Diagnostic
The object of study = diagnostic accuracy of a computational or laboratory index test
Index test: sequence or spectrum matching algorithm
Reference standard: species identification (e.g., culture, gold-standard sequencing)
🔬 2. Diagnostic Logic: From Query to Clinical Answer
The bioinformatics pipeline actually mirrors the diagnostic accuracy framework:
Diagnostic concept | Bioinformatics equivalent | Explanation |
Index test | Database matching algorithm (e.g., BLAST, MALDI-TOF, Kraken2, Bruker Biotyper) | It “tests” what your isolate might be. |
Reference standard | Verified species ID (e.g., 16S rRNA sequencing, WGS reference) | The “true disease” or ground truth. |
Test result (Query → Hit → Score) | Alignment or spectral match producing score | Indicates how close your sample is to known reference profiles. |
Decision threshold | Score cutoff | Determines whether the hit is “positive” or “negative” for a given organism. |
This is precisely how we operationalize the diagnostic accuracy study design.
⚙️ 3. The Core Principle:
Quantified Similarity = Diagnostic Evidence
Step 1. Query
Your biological material (DNA, protein spectrum, etc.) → converted into a digital “fingerprint”:
DNA: sequence reads
Protein: mass peaks
RNA: expression signature
Step 2. Database
Reference library of known organisms’ signatures.E.g.,
MALDI-TOF: spectra from reference strains
BLAST/RefSeq: DNA/protein sequences
Step 3. Matching → Hit
Algorithm aligns your query fingerprint with database entries and reports:
Hit: which reference is most similar
Score: how strong the match is
MALDI: log(score) 0–3 scale
BLAST: bit score, E-value
Metagenomics: percent identity, coverage
Step 4. Threshold → Diagnostic Call
Every platform defines a cutoff where “match = identification”:
MALDI-TOF ≥2.0 → species-level ID
1.7–1.99 → genus-level
<1.7 → unidentifiable
This parallels diagnostic cutoffs (like sensitivity/specificity in lab tests).
📊 4. Scoring = Clinical Accuracy Metrics
After generating hits and scores for many samples, you can build a diagnostic accuracy study:
Clinical metric | Bioinformatic equivalent |
Sensitivity (TP/(TP+FN)) | % of isolates correctly identified above threshold |
Specificity (TN/(TN+FP)) | % of non-target isolates correctly rejected |
AUROC | Performance of score cutoff |
Likelihood ratios (LR+/LR–) | Probability of true ID given score above/below cutoff |
You can visualize this as a Receiver Operating Characteristic (ROC) curve, where “score” acts as the continuous diagnostic marker.
🧩 5. Etiologic & Epidemiologic Extension
Once identification is validated → we can move to etiologic inference (DEPTh = Etiology) :
Query → Identify organism
Organism (Exposure X) → Clinical Outcome (Y)
Now your database match becomes a predictor variable in your causal model:
Y = f(Organism identified by query | confounders)
species or genotypes cause specific infections, resistance, or outcomes.
Insight
The “bio database score-hit logic” is a digital analog of a diagnostic accuracy test:
Query = sample
Database = reference test
Score = index test result
Hit threshold = diagnostic cutoff
Performance metrics = sensitivity/specificity
When we publish or validate such tools (e.g., MALDI-TOF, 16S classifiers, metagenomic ID), we must report according to STARD 2015 and evaluate bias via QUADAS-2.
Key Takeaways
Database matching in biology = a diagnostic index test under DEPTh logic.
“Score–hit–query” structure parallels “test value–cutoff–disease status.”
Statistical validation uses AUROC, Se/Sp, and LR metrics.
Once validated, identified organisms can enter etiologic models (cause–effect analysis).
Always assess bias via QUADAS-2 and report with STARD.





Comments