
Stata mfp in Practice: Fractional Polynomials, select(), df(), and the Dummy-Variable xi: Workaround
Fractional polynomials, selection control, and using dummy variables (xi: workaround) This article is focused only on mfp (no MI, no validation workflow), and it is written for researchers who want to: model non-linear continuous predictors in one regression model, and understand exactly what the key mfp syntaxes and options do, especially select() df() and the dummy-variable / xi: workaround when factor-variable syntax is not accepted. 1) What problem does mfp solve? In m

Degrees of Freedom in Fractional Polynomial Modeling (FP/MFP): What df(1), df(2), and df(4) Really Mean
A clinical-epidemiology article on what “df(1), df(2), df(4)” really mean (and what they do not mean) Abstract Fractional polynomials (FP) are a structured approach for modeling non-linear associations between continuous predictors (e.g., age, hemoglobin, creatinine) and outcomes without categorizing variables or using unstable high-degree polynomials. In FP and Stata’s multivariable fractional polynomial (MFP) workflow, the degrees of freedom settings—linear (1 df), FP1 (2

Dummy Variables + mfp in Stata: A Practical Guide (with xi: and mfpa)
Introduction This short “how-to” is written for researchers who hit the same wall you did: You want multivariable fractional polynomials (MFP) for continuous predictors (non-linearity handling), but mfp does not accept factor-variable syntax (i.var, c.var##c.var), and it often breaks inside mi estimate unless you handle categorical variables correctly. The solution is usually simple: pre-create dummy variables (best practice) or use xi: (quick fix). mfpa is an alternativ

Bootstrap Before kNN Is Not Internal Validation: Clarifying Imputation Variability vs Model Optimism
We don’t have to bootstrap before kNN because it’s not a rule—it’s just an optional way to reflect imputation variability, not internal validation. Bootstrap before kNN cannot be claimed as internal validation , because it does not involve model fitting and testing on different data and thus does not estimate optimism . Internal validation requires bootstrapping of the final model; bootstrapping the imputation step alone is insufficient, because we fit the model on the im

Why Missing Data Requires Both Imputation [Bootstrap then kNN] and Bootstrap [again] Internal Validation
First: fix one wrong mental picture I am thinking: “If complete data can do bootstrap once,missing data must do bootstrap twice → this feels like cheating / overkill.” This feeling comes from counting datasets , but internal validation is not about counting datasets . 👉 Internal validation is about ONE comparison : Was the model evaluated on data it was NOT trained on? Everything else is bookkeeping. Case 1 COMPLETE DATA (no missing) ❓ What is the correct internal validatio

Internal Validation with Bootstrap, kNN Imputation, and Fractional Polynomial Models [Thai]
(กรณี kNN imputation หลายชุด + Fractional Polynomial + Bootstrap) บทนำ: เรากำลังพยายามตอบคำถามอะไร? ในการพัฒนา prediction model คำถามสำคัญไม่ใช่แค่ว่า “โมเดล fit กับข้อมูลเราได้ดีแค่ไหน?” แต่คือ “โมเดลนี้ overfit แค่ไหนและถ้าเอาไปใช้กับคนใหม่ประสิทธิภาพจะลดลงเท่าไร?” การตอบคำถามนี้เรียกว่า Internal Validation บริบทของงานนี้ (Your exact problem) งานนี้มีความซับซ้อน 3 ชั้นพร้อมกัน: Missing data จำนวนมาก → ใช้ Boostrap before kNN single imputation ซ้ำหลายครั้ง → ได้ imputed dat

Handling Missing Data in Clinical Prediction Models: Bootstrap kNN vs Multiple Imputation
Overview Missing data are common in clinical datasets and must be handled carefully to avoid biased estimates, inflated performance, and invalid inference. In this study, missing values were addressed using k-nearest neighbor (kNN) imputation as an alternative to multiple imputation (MI) , followed by bootstrap-based internal validation of a diagnostic prediction model for gastrointestinal malignancy. This section describes the theoretical justification , practical implemen

Reading Bioinformatics / Precision Medicine Papers Systematically: EDPC Framework: Etiological, Discovery, Predictive, Confirmatory in Precision Medicine
Etiological • Discovery • Predictive • Confirmatory (EDPC) Precision medicine papers often look similar (omics + fancy plots), but they can be doing four very different jobs . Your slide deck defines these four objectives clearly: Etiological, Discovery, Predictive, Confirmatory . If you misclassify the objective, you will misread the results (e.g., treating “discovery” as “prediction”, or treating “prediction” as “clinical utility”). The EDPC map (what kind of paper is this

How to Choose Statistical Coefficients for Each Type of Reliability
Summary Table: Types of Reliability & Statistical Coefficients Reliability Type Purpose Data Type Statistical Coefficients (Named Statistics) 1. Test–Retest Reliability Measures stability over time (same test, two occasions) Continuous • Pearson r • Spearman ρ (ordinal or non-normal) • ICC (Intraclass Correlation Coefficient) • CCC (Concordance Correlation Coefficient) Ordinal • Spearman ρ • Weighted Cohen’s kappa Nominal / Dichotomous • Cohen’s kappa (κ) — Coeffici

How to Find Function Origins & Namespace Discipline in RStudio
1. Base R Functions Require No Library Some functions—such as paste0(), mean(), round(), factor()—belong to base R , which loads automatically. To verify: getAnywhere("paste0") Typical output: A single object matching ‘paste0’ was found It was found in: package:base Therefore: paste0() → base package → no library() required . 2. Determine the Origin of Any Function R exposes two powerful tools for function lookup. Method 1 — getAnywhere() Works even for hidden or S3/S4 gene

A Beginner’s Guide to Python Environments
Introduction A Beginner’s Guide to Python Environments A clean, practical introduction for new programmers, researchers, and CECS students Managing Python environments is one of the most important early skills for anyone entering programming, data science, or clinical statistics. Many beginners underestimate this topic—until they face problems such as: “ModuleNotFoundError: package not found” “This function works on my laptop but not on the lab computer.” “Upgrading pandas br

Effect Size, MCID/CID, and Sample Size Relevance
1. Effect Size: The Foundation of Clinical Interpretation Effect size (ES) is the magnitude of difference or association between groups, exposures, treatments, or predictors. It is the central component of all DEPTh areas (diagnosis, etiology, prognosis, therapeutic, methodologic). “Always interpret effect size + 95% CI, not p-values alone.” Common Effect Size Metrics by Research Type DEPTh Type Effect Size Metrics Therapeutic Risk Ratio, Risk Difference, Mean Difference, H

Muscle Cramps (ตะคริว): Causes, Management, and When to Worry
1. Definition A muscle cramp is a sudden, painful, involuntary contraction of a muscle or muscle group, most commonly affecting the calves , feet , hamstrings , and sometimes hands or abdomen. Most cases are benign and arise from neuromuscular irritability due to electrolyte imbalance, dehydration, or muscle fatigue. 2. Epidemiology Very common: affects up to 60% of adults . More frequent at night and in elderly. Certain populations (pregnant women, athletes, diuretic users

Markdown vs Quarto: Choosing the Right Tool for Clinical Research
1. Overview: Markdown vs Quarto Markdown (MD) Markdown is a lightweight markup language used to format text into headings, bold/italic, lists, tables, code blocks, etc. Strengths: Simple syntax Ideal for README files, notes, tutorials Supported anywhere: GitHub, VS Code, RStudio, Stata logs (via dyndoc) Good for static documentation Limitations: Limited automation No native support for executing code chunks Cannot render statistical results automatically → Markdown alone is N

How to Report Model Updating (or Not) After Debray's External Validation: Examples, Tables & Templates
1. How they present external validation (text + tables) 1.1 Paper A – IDIOM model with updating Structure of the text Introduction Clinical background: iron deficiency anemia, GI malignancy. Introduces IDIOM model, original performance. States two aims : Describe prevalence/clinical characteristics of GI malignancy in Thai IDA patients Externally validate IDIOM and update it if needed. Methods Study design & patients (retrospective, single center, inclusion criteria, defini

Detrusor Underactivity (Underactive Bladder): Causes, Symptoms & Management [Bethanechol]
1. Definition Detrusor underactivity (DU) or Underactive Bladder (UAB) is a condition where the bladder muscle (detrusor) contracts too weakly and/or too briefly to empty the bladder completely during voiding. Formal urodynamic definition (ICS – International Continence Society): “A contraction of reduced strength and/or duration, resulting in prolonged bladder emptying and/or failure to achieve complete bladder emptying within a normal time span.” Clinically, you will see:

Seborrheic Dermatitis (Sebderm): Causes, Features & Treatment Overview
1. What is seborrheic dermatitis? Seborrheic dermatitis is a chronic, relapsing inflammatory skin disease affecting sebaceous (oil-rich) areas : Scalp Face (especially eyebrows, nasolabial folds, glabella) Ears, presternal chest, upper back, body folds It’s strongly associated with Malassezia (yeast) , sebum, and abnormal immune response. Prevalence: about 1–3% of the general population , much higher in HIV and neurologic disease (e.g. Parkinson’s). 2. Pathophysiology (high-











