top of page

DEPTh Typing as Diagnosis: Clinical Interpretation of Database-Based Identification

  • Writer: Mayta
    Mayta
  • Oct 21, 2025
  • 3 min read

🧭 1. DEPTh Typing: This is a Diagnostic Challenge

So your challenge = “Given a biological isolate (bio sample), how do we determine what organism it is — using a database comparison with a score and a hit?” DEPTh type: Diagnostic 

The object of study = diagnostic accuracy of a computational or laboratory index test

  • Index test: sequence or spectrum matching algorithm

  • Reference standard: species identification (e.g., culture, gold-standard sequencing)

🔬 2. Diagnostic Logic: From Query to Clinical Answer

The bioinformatics pipeline actually mirrors the diagnostic accuracy framework:

Diagnostic concept

Bioinformatics equivalent

Explanation

Index test

Database matching algorithm (e.g., BLAST, MALDI-TOF, Kraken2, Bruker Biotyper)

It “tests” what your isolate might be.

Reference standard

Verified species ID (e.g., 16S rRNA sequencing, WGS reference)

The “true disease” or ground truth.

Test result (Query → Hit → Score)

Alignment or spectral match producing score

Indicates how close your sample is to known reference profiles.

Decision threshold

Score cutoff

Determines whether the hit is “positive” or “negative” for a given organism.

This is precisely how we operationalize the diagnostic accuracy study design.

⚙️ 3. The Core Principle: Quantified Similarity = Diagnostic Evidence

Step 1. Query

Your biological material (DNA, protein spectrum, etc.) → converted into a digital “fingerprint”:

  • DNA: sequence reads

  • Protein: mass peaks

  • RNA: expression signature

Step 2. Database

Reference library of known organisms’ signatures.E.g.,

  • MALDI-TOF: spectra from reference strains

  • BLAST/RefSeq: DNA/protein sequences

Step 3. Matching → Hit

Algorithm aligns your query fingerprint with database entries and reports:

  • Hit: which reference is most similar

  • Score: how strong the match is

    • MALDI: log(score) 0–3 scale

    • BLAST: bit score, E-value

    • Metagenomics: percent identity, coverage

Step 4. Threshold → Diagnostic Call

Every platform defines a cutoff where “match = identification”:

  • MALDI-TOF ≥2.0 → species-level ID

  • 1.7–1.99 → genus-level

  • <1.7 → unidentifiable

This parallels diagnostic cutoffs (like sensitivity/specificity in lab tests).

📊 4. Scoring = Clinical Accuracy Metrics

After generating hits and scores for many samples, you can build a diagnostic accuracy study:

Clinical metric

Bioinformatic equivalent

Sensitivity (TP/(TP+FN))

% of isolates correctly identified above threshold

Specificity (TN/(TN+FP))

% of non-target isolates correctly rejected

AUROC

Performance of score cutoff

Likelihood ratios (LR+/LR–)

Probability of true ID given score above/below cutoff

You can visualize this as a Receiver Operating Characteristic (ROC) curve, where “score” acts as the continuous diagnostic marker.

🧩 5. Etiologic & Epidemiologic Extension

Once identification is validated → we can move to etiologic inference (DEPTh = Etiology) :

  • Query → Identify organism

  • Organism (Exposure X) → Clinical Outcome (Y)

Now your database match becomes a predictor variable in your causal model: Y = f(Organism identified by query | confounders) species or genotypes cause specific infections, resistance, or outcomes.

Insight

The “bio database score-hit logic” is a digital analog of a diagnostic accuracy test:

  • Query = sample

  • Database = reference test

  • Score = index test result

  • Hit threshold = diagnostic cutoff

  • Performance metrics = sensitivity/specificity

When we publish or validate such tools (e.g., MALDI-TOF, 16S classifiers, metagenomic ID), we must report according to STARD 2015 and evaluate bias via QUADAS-2.

Key Takeaways

  • Database matching in biology = a diagnostic index test under DEPTh logic.

  • “Score–hit–query” structure parallels “test value–cutoff–disease status.”

  • Statistical validation uses AUROC, Se/Sp, and LR metrics.

  • Once validated, identified organisms can enter etiologic models (cause–effect analysis).

  • Always assess bias via QUADAS-2 and report with STARD.



Recent Posts

See All
Internal Validation vs Instability

Pocket note “The concept depends on which dataset you compare the model against (i.e., where you evaluate it).” Why it feels like “same data but different view” Think of data as wearing different hat

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page