Reading Bioinformatics / Precision Medicine Papers Systematically: EDPC Framework: Etiological, Discovery, Predictive, Confirmatory in Precision Medicine

Mayta
Dec 14, 2025
4 min read

Etiological • Discovery • Predictive • Confirmatory (EDPC)

Precision medicine papers often look similar (omics + fancy plots), but they can be doing four very different jobs. Your slide deck defines these four objectives clearly: Etiological, Discovery, Predictive, Confirmatory. If you misclassify the objective, you will misread the results (e.g., treating “discovery” as “prediction”, or treating “prediction” as “clinical utility”).

The EDPC map (what kind of paper is this?)

1) Etiological (Heterogeneity / Landscape)

Definition (paper’s job): “Characterization of heterogeneity across individual-level data”.

Core question the paper is trying to answer

“What molecular subtypes exist in this disease?”
“What is the landscape/profile of alterations in this cohort (large or rare)?”

Keyword radar (words you see in title/abstract)

landscape, profiling, molecular portrait, heterogeneity, subtype, taxonomy, atlas, signature patterns
mutation signature, genomic landscape

Typical outputs / figures

Clustering heatmaps, subtype diagrams, mutation landscapes (oncoplots), pathway enrichment maps.

How to judge quality (fast)

Domain clarity: who are the samples from (disease definition, stage, treatment status)?
Sample logic: tissue/blood choice should match biology/phenotype (timing + sample type matter) .
Avoid over-claiming: etiological ≠ causal, it’s heterogeneity description.

2) Discovery (Association-finding / Hypothesis generation)

Definition (paper’s job): “Exploration of associations between a set of clinical features and outcome heterogeneity… exploratory analysis of risk factors”.

Core question

“Which genes/features differ between groups?”
“Which features are associated with outcome (or subgroup differences)?”Your slides frame this as finding association in ≥2 groups.

Keyword radar

differential expression, associated with, correlates with, enrichment, candidate biomarker, exploratory, screen, feature selection
Often sits between candidate marker vs omic discovery choice.

Typical outputs

Volcano plots (DE), Manhattan plots (GWAS), correlation networks, ranked gene lists.

How to judge quality

Multiple testing control: do they control FDR / adjust p-values?
Confounding & batch effects: are associations driven by platform/batch/center rather than biology?
Replication signal: do they test in an independent dataset or just “internal split”? (Discovery without replication = hypothesis only.)

Common trap

Discovery papers often sound predictive but are not. If it ends at “top genes” without a validated performance target, it’s a discovery.

3) Predictive (Individual-level prediction / tool-building)

Definition (paper’s job): “Development of a specific approach(es) to predict heterogeneity in clinical or treatment-related outcomes for individuals or subgroups”.

Core question

“Can I predict diagnosis / prognosis / response for an individual (or subgroup) using omics features?”

Keyword radar

prediction model, classifier, risk score, machine learning, AUROC, AUPRC, C-index, calibration, external validation, test setYour slides explicitly call “prediction research = create a prediction tool”.

Typical outputs

Model coefficients / feature weights, AUROC curves, calibration plots, decision curves, confusion matrices.

How to judge quality (most important)

Point of prediction: When is prediction made? (pre-treatment vs post-op vs relapse)
Leakage control: Did they accidentally use information only available after outcome (classic in omics pipelines)?
Validation level: internal CV is not enough—look for external/independent cohorts when possible (otherwise it’s “promising”, not “ready”).

Common trap

Treating a “train-test split” in the same dataset as “validation.” That is still weak unless it is truly independent (time/site/platform).

4) Confirmatory (Reproduction / robustness)

Definition (paper’s job): “Reproduction of a previously proposed precision medicine approach”.

Your slides show a clean example: systematically evaluating previously published prognostic gene signatures for HCC to identify robust and reproducible biomarkers that predict OS , with confirmatory evidence shown using survival comparisons (Kaplan–Meier/log-rank) in a dataset.

Core question

“Does this previously reported signature/model/biomarker still work in new data?”

Keyword radar

replication, reproduce, independent cohort, external validation, robustness, generalizability, meta-signature, benchmarking, comparative evaluation

How to judge quality

Truly independent data (new cohort, different site/time/platform).
Same target definition (same outcome definition, same time horizon, same population domain).
Transparent model transport: Did they re-fit (new model) or validate as-is (true confirmatory)?

The “deep & systematic” reading workflow (use this every time)

Step 1 — Classify the objective (EDPC)

Use the EDPC definitions above. If you can’t name the objective, you can’t interpret the claims.

Step 2 — Extract the Core Structure (the survival kit)

Your deck gives the core structure extraction template:Study objective → Study domain → Study determinants → Omics type → Sample type → Outcome .This is the fastest way to detect “beautiful analysis, wrong question.”

Step 3 — Verify “omics type” and vocabulary

The slides provide a practical keyword list for omics data types (genome/epigenome/transcriptome/proteome/microbiome/metabolomic/multi-omic). If a paper is vague (“molecular markers”), your deck warns: define the terms in methodology.

Step 4 — Check sample rationale (biology ↔ phenotype)

Your slides stress clinical/biological rationale: sample type + timing + sequencing technique must relate to phenotype. Example logic shown: tumor tissue (somatic) vs buccal/WBC (germline) questions change the meaning entirely.

Step 5 — Interpret results only inside the objective

Etiological → “we mapped heterogeneity” (don’t claim prediction/utility)
Discovery → “we found candidates” (don’t claim causality/clinical tool)
Predictive → “we built a tool” (needs validation + no leakage)
Confirmatory → “it reproduces” (independent data, same target)

Mini “Objective-to-Question” cheat sheet

Etiological: “What subtypes/landscapes exist in this disease cohort?”
Discovery: “What features associate with outcomes or group differences?”
Predictive: “Can we predict diagnosis/prognosis/response for individuals/subgroups?”
Confirmatory: “Does an existing approach reproduce robustly in new data?”

Recap

EDPC is an evidence-maturity lens for bioinformatics papers.
Use Core Structure Extraction to read any paper systematically.
Most misinterpretation comes from confusing Discovery vs Predictive and calling non-replicated findings “confirmed”.