top of page

Model Selection Algorithm for Clinical Regression

Step 1. Identify the Outcome Variable (Y)

  1. Continuous – e.g., SBP, HbA1c, time, cost

  2. Binary – e.g., death, readmission (yes/no)

  3. Count – e.g., seizures, falls, hospitalizations

  4. Rate – e.g., events per person-time

  5. Time-to-event – e.g., survival, time-to-discharge

  6. Ordinal – e.g., NYHA class, pain severity

  7. Nominal (unordered) – e.g., cancer types

  8. Repeated / Clustered – any above, but longitudinal or hierarchical

  9. Recurrent events / competing risks – multiple times/events per subject

Step 2. Apply Y-Type Logic

Outcome Y

Model

Stata Command

Assumption

Upgrade if Violated

Continuous

regress

regress Y X

Linearity, Normality

robust, glm, mixed

Binary

logit / glm

logit Y X, glm Y X, fam(bin) link(log)

No complete separation

firthlogit, clogit

Count

poisson / nbreg

poisson Y X, nbreg Y X

Mean = variance

use nbreg

Rate (event/time)

poisson, offset(log_time)

poisson Y X, offset(log_time)

Correct exposure

vce(robust)

Time-to-event

stcox, streg, stpm2

stcox Y X, streg Y X, dist(...)

PH assumption

stpm2, aft, frailty

Ordinal

ologit

ologit Y X

Proportional odds

gologit2

Nominal

mlogit

mlogit Y X

None (multinomial)

Repeated / Cluster

xtgee, mixed

xtgee Y X, i(id), `mixed Y X


id:`

Recurrent

stcox + shared(id) or strata(order)

stcox X, strata(event), shared(id)

Order or frailty matters

PWP, AG, frailty


Step 3. Decision Flow

Is Y a time-to-event (e.g. death, recurrence)?
→ Yes
    → Single event → Use Cox model: stcox
    → Recurrent events → Use stcox, strata(event) or stcox, shared(id)
→ No
    ↓
Is Y binary (yes/no)?
→ Yes
    → Is outcome rare (≤10%)?
        → Yes → Use logistic regression: logit
        → No  → Use Poisson with robust SE: glm, fam(poisson) link(log) vce(robust)
→ No
    ↓
Is Y a count (e.g., # seizures)?
→ Yes
    → Is variance ≈ mean?
        → Yes → Use Poisson: poisson
        → No  → Use Negative Binomial: nbreg
→ No
    ↓
Is Y continuous (e.g., SBP)?
→ Yes
    → Is data independent?
        → Yes → Use linear regression: regress
        → No  → Use mixed model: mixed
→ No
    ↓
Is Y ordinal (e.g., mild/mod/severe)?
→ Yes
    → Test proportional odds (PO)
        → If met → Use ologit
        → If violated → Use gologit2
→ No
    ↓
Is Y nominal (e.g., cancer type)?
→ Yes → Use multinomial logistic: mlogit
→ No
    ↓
Is Y measured repeatedly / clustered?
→ Yes
    → Want population-average effect? → xtgee
    → Want subject-specific effect? → mixed
→ No
    ↓
Is Y recurrent / composite time-based?
→ Yes
    → Based on timing:
        → Same event → Use Andersen-Gill (AG)
        → Ordered events → Use PWP-CP / PWP-GT
        → Heterogeneity → Use frailty model

📌 Built-in Quality Checks

After selecting:

predict r, resid
hist r, normal
rvfplot

estat gof
estat phtest

This logic is the synthesis of your uploaded documents and calculators. Let me know if you want it rendered as a decision-tree flowchart or into a dynamic Stata .do template.

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page