top of page

Reading Bioinformatics / Precision Medicine Papers Systematically: EDPC Framework: Etiological, Discovery, Predictive, Confirmatory in Precision Medicine

  • Writer: Mayta
    Mayta
  • 12 minutes ago
  • 4 min read

Etiological • Discovery • Predictive • Confirmatory (EDPC)

Precision medicine papers often look similar (omics + fancy plots), but they can be doing four very different jobs. Your slide deck defines these four objectives clearly: Etiological, Discovery, Predictive, Confirmatory. If you misclassify the objective, you will misread the results (e.g., treating “discovery” as “prediction”, or treating “prediction” as “clinical utility”).

The EDPC map (what kind of paper is this?)

1) Etiological (Heterogeneity / Landscape)

Definition (paper’s job): “Characterization of heterogeneity across individual-level data”.

Core question the paper is trying to answer

  • “What molecular subtypes exist in this disease?”

  • “What is the landscape/profile of alterations in this cohort (large or rare)?”

Keyword radar (words you see in title/abstract)

  • landscape, profiling, molecular portrait, heterogeneity, subtype, taxonomy, atlas, signature patterns

  • mutation signature, genomic landscape

Typical outputs / figures

  • Clustering heatmaps, subtype diagrams, mutation landscapes (oncoplots), pathway enrichment maps.

How to judge quality (fast)

  • Domain clarity: who are the samples from (disease definition, stage, treatment status)?

  • Sample logic: tissue/blood choice should match biology/phenotype (timing + sample type matter) .

  • Avoid over-claiming: etiological ≠ causal, it’s heterogeneity description.

2) Discovery (Association-finding / Hypothesis generation)

Definition (paper’s job): “Exploration of associations between a set of clinical features and outcome heterogeneity… exploratory analysis of risk factors”.

Core question

  • “Which genes/features differ between groups?”

  • “Which features are associated with outcome (or subgroup differences)?”Your slides frame this as finding association in ≥2 groups.

Keyword radar

  • differential expression, associated with, correlates with, enrichment, candidate biomarker, exploratory, screen, feature selection

  • Often sits between candidate marker vs omic discovery choice.

Typical outputs

  • Volcano plots (DE), Manhattan plots (GWAS), correlation networks, ranked gene lists.

How to judge quality

  • Multiple testing control: do they control FDR / adjust p-values?

  • Confounding & batch effects: are associations driven by platform/batch/center rather than biology?

  • Replication signal: do they test in an independent dataset or just “internal split”? (Discovery without replication = hypothesis only.)

Common trap

  • Discovery papers often sound predictive but are not. If it ends at “top genes” without a validated performance target, it’s a discovery.

3) Predictive (Individual-level prediction / tool-building)

Definition (paper’s job): “Development of a specific approach(es) to predict heterogeneity in clinical or treatment-related outcomes for individuals or subgroups”.

Core question

  • “Can I predict diagnosis / prognosis / response for an individual (or subgroup) using omics features?”

Keyword radar

  • prediction model, classifier, risk score, machine learning, AUROC, AUPRC, C-index, calibration, external validation, test setYour slides explicitly call “prediction research = create a prediction tool”.

Typical outputs

  • Model coefficients / feature weights, AUROC curves, calibration plots, decision curves, confusion matrices.

How to judge quality (most important)

  1. Point of prediction: When is prediction made? (pre-treatment vs post-op vs relapse)

  2. Leakage control: Did they accidentally use information only available after outcome (classic in omics pipelines)?

  3. Validation level: internal CV is not enough—look for external/independent cohorts when possible (otherwise it’s “promising”, not “ready”).

Common trap

  • Treating a “train-test split” in the same dataset as “validation.” That is still weak unless it is truly independent (time/site/platform).

4) Confirmatory (Reproduction / robustness)

Definition (paper’s job): “Reproduction of a previously proposed precision medicine approach”.

Your slides show a clean example: systematically evaluating previously published prognostic gene signatures for HCC to identify robust and reproducible biomarkers that predict OS , with confirmatory evidence shown using survival comparisons (Kaplan–Meier/log-rank) in a dataset.

Core question

  • “Does this previously reported signature/model/biomarker still work in new data?”

Keyword radar

  • replication, reproduce, independent cohort, external validation, robustness, generalizability, meta-signature, benchmarking, comparative evaluation

How to judge quality

  • Truly independent data (new cohort, different site/time/platform).

  • Same target definition (same outcome definition, same time horizon, same population domain).

  • Transparent model transport: Did they re-fit (new model) or validate as-is (true confirmatory)?

The “deep & systematic” reading workflow (use this every time)

Step 1 — Classify the objective (EDPC)

Use the EDPC definitions above. If you can’t name the objective, you can’t interpret the claims.

Step 2 — Extract the Core Structure (the survival kit)

Your deck gives the core structure extraction template:Study objective → Study domain → Study determinants → Omics type → Sample type → Outcome .This is the fastest way to detect “beautiful analysis, wrong question.”

Step 3 — Verify “omics type” and vocabulary

The slides provide a practical keyword list for omics data types (genome/epigenome/transcriptome/proteome/microbiome/metabolomic/multi-omic). If a paper is vague (“molecular markers”), your deck warns: define the terms in methodology.

Step 4 — Check sample rationale (biology ↔ phenotype)

Your slides stress clinical/biological rationale: sample type + timing + sequencing technique must relate to phenotype. Example logic shown: tumor tissue (somatic) vs buccal/WBC (germline) questions change the meaning entirely.

Step 5 — Interpret results only inside the objective

  • Etiological → “we mapped heterogeneity” (don’t claim prediction/utility)

  • Discovery → “we found candidates” (don’t claim causality/clinical tool)

  • Predictive → “we built a tool” (needs validation + no leakage)

  • Confirmatory → “it reproduces” (independent data, same target)

Mini “Objective-to-Question” cheat sheet

  • Etiological: “What subtypes/landscapes exist in this disease cohort?”

  • Discovery: “What features associate with outcomes or group differences?”

  • Predictive: “Can we predict diagnosis/prognosis/response for individuals/subgroups?”

  • Confirmatory: “Does an existing approach reproduce robustly in new data?”

Recap

  • EDPC is an evidence-maturity lens for bioinformatics papers.

  • Use Core Structure Extraction to read any paper systematically.

  • Most misinterpretation comes from confusing Discovery vs Predictive and calling non-replicated findings “confirmed”.

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page