
Cohen’s Kappa vs Weighted Kappa: Measuring Agreement Beyond Chance
How to measure agreement beyond chance 1. Why do we need Kappa? When two raters (or two methods) classify patients into categories—for example: Fracture: yes / no CT severity: mild / moderate / severe ECG finding: normal / abnormal we want to know: Do they really agree, or are they just “lucky” to match by chance? Simple percent agreement (% of cases where both give the same category) is easy to understand but has a big limitation: If one category is very common (e.g., “no d

How Clinical Scores Are Built: From Logistic Coefficients to Point Systems
1. Where does a clinical score come from? Most modern scores come from a prediction model , usually: Logistic regression for binary outcomes (e.g. appendicitis: yes/no) Cox model for time-to-event (e.g. 10-year CVD risk) For a logistic model, the development team fits: Y = outcome (e.g. disease yes/no) Xj = predictors (e.g. fever, RLQ pain, WBC, etc.) α = intercept βj = log-odds coefficients Those βj are the true origin of the score. The score is just a simplified, rounded v

Agreement vs Reliability in Categorical Data: A Practical Guide for Clinical Researchers
1. Why do we care about reliability? When you design a clinical tool (score, scale, questionnaire, diagnostic classification), you usually have: Raters (people or systems making the judgment) Repeated measurements (same patient, same test, measured twice or more) Categorical outcomes (e.g. “present/absent”, “stage I/II/III”, “mild/moderate/severe”) You want to know: Agreement – Do raters give the same category? Reliability – Does the measurement reflect true differences

Step-by-Step External Validation and Recalibration in Stata (Step 2 in Debray Framework)
Step 0 – Load the external validation cohort clear cd "C:\WORK\Location\External validation" use "validation.dta", clear summarize Why? We start with the validation dataset only . Step 2 is about how the original model behaves in a new population – no re-fitting from scratch, just evaluation and recalibration. Step 1 – Reconstruct the original linear predictor and risk * 1.1 Generate the linear predictor (log-odds) using original coefficients gen logodds = -4.4415 /// + 0.

How to Build a Clinical Prediction Model (CPM) in Stata: Step-by-Step with Stata Code
Steps to Developing a Clinical Prediction Model (CPM) Choose predictors & run forward/backward Fit the final logistic model cleanly Read α (intercept) and β’s from Stata Write the prediction equation Generate LP and risk in Stata All in Stata-only . Step 1 – Start with development data and candidate predictors clear use "your_development_data.dta", clear * Inspect variables describe summarize Assume you want to predict death (0/1) from a set of candidate predictors: age sex

Calibration Plot in Clinical Prediction Models [Calibration-in-the-Large (CITL), Calibration Slope]
Abstract Calibration is a fundamental property of clinical prediction models (CPMs), reflecting how well predicted probabilities agree with actual observed outcomes. Unlike discrimination—how well a model distinguishes between individuals with and without an event—calibration evaluates absolute accuracy. Poor calibration can mislead clinical decision-making even when discrimination appears acceptable. This article explains the conceptual foundation, metrics, and practical int

Pocket Guide to Critical Appraisal of RCTs (DDO Framework)
INTRODUCTION As clinicians, we constantly face questions such as “Does this treatment really work?” or “Should I trust this new study?” ในฐานะแพทย์ เรามักเจอคำถามว่า “วิธีรักษานี้ได้ผลจริงไหม?” หรือ “งานวิจัยใหม่นี้เชื่อได้แค่ไหน?” Being able to read and interpret RCTs is essential because RCTs are the gold standard for therapeutic evidence and commonly appear in licensing exams. ทักษะการอ่านและแปลความหมาย RCT เป็นสิ่งจำเป็น เพราะ RCT เป็นหลักฐานสูงสุดด้านการรักษา และออกสอบใบ

How to Critically Appraise a Randomized Controlled Trial (RCT) Using the DDO Framework and Cochrane Tools
Introduction As clinicians, we constantly face questions such as “Is this drug effective?” , “Is that treatment truly better?” , or “This new study says it works — should we believe it?” These questions come from patients, colleagues, hospital administrators, and even from within our own decision-making as we choose the best treatment for the person in front of us. Because of this, one of the most important skills for every physician is the ability to read, interpret, and jud

How to Diagnose and Manage Nail Psoriasis vs Onychomycosis
1. Diagnosis Criteria 🔵 A. Nail Psoriasis – Diagnostic Criteria Clinical Diagnosis (no single gold standard). Diagnosis is based on classic nail findings + history of psoriasis . Major Nail Features Pitting Oil-drop (salmon patch) discoloration Onycholysis with erythematous border Subungual hyperkeratosis (psoriatic type: chalky, white) Nail crumbling / roughness Leukonychia Supportive Features Current or past cutaneous psoriasis Psoriatic arthritis Family history of psorias

Diagnosis and Management of Acute Otitis Media (AOM) vs Otitis Media with Effusion (OME)
✅ 1. Diagnosis of Acute Otitis Media (AOM) Diagnostic Criteria — must have ALL: A. Acute symptoms Fever Otalgia (ear pain) Irritability in children Otorrhea (only if TM perforation) B. Middle-ear inflammation Seen on otoscopy: Bulging tympanic membrane (TM) — most specific finding Erythema of TM Reduced mobility on pneumatic otoscopy C. Middle-ear effusion Opaque TM Air-fluid level Loss of TM landmarks 👉 Bulging TM + acute ear pain = AOM until proven otherwise. ✅ 2. Diagn

Outpatient OPD Pneumonia: Amoxicillin & Cefdinir Regimens Explained
✅ 1. Amoxicillin Regimen (Ready to Use) Amoxicillin (500 mg) 2×3 po pc for 7 days ✔ Meaning: 500 mg tablets, take 2 tablets (1 g) three times a day , after meals , for 7 days . Final Prescription Line: Amoxicillin (500 mg) 2×3 po pc × 7 days ✅ 2. Cefdinir Regimen (Ready to Use) Two common OPD pneumonia dosing patterns exist. Use whichever your professor prefers. Option A: Standard Adult CAP Regimen Cefdinir (300 mg) 1×2 po bid × 7 days ✔ Meaning: 300 mg one capsule , twice a

Fixed, Random, and Mixed-Effects Models: Choosing the Right Meta-Analytic Approach
Introduction The choice between Fixed-effects , Random-effects , and Mixed-effects models fundamentally shapes how clinicians and researchers interpret pooled evidence. In therapeutic evaluation, causal inference, and complex trial designs, the model you choose determines whether your conclusions reflect a single underlying effect , an average effect across diverse settings , or a heterogeneity-explained effect dependent on study-level characteristics . Grounding this logic

Inter-Rater Agreement in Clinical Research: Importance, Metrics, and Methodological Role
Abstract Inter-rater agreement plays a foundational role in ensuring the reliability, reproducibility, and validity of clinical research involving human judgment. Whether interpreting radiologic studies, applying diagnostic criteria, assessing prognostic variables, or scoring clinical outcomes, consistency among raters determines whether a measurement strategy is trustworthy enough to be used in clinical studies or patient care. High agreement strengthens the study’s internal

Within-Design and Between-Design Heterogeneity in Network Meta-Analysis
Introduction When people first read about network meta-analysis (NMA) , they often understand ideas like direct and indirect comparisons, but get stuck on two more technical terms: Within-design heterogeneity Between-design heterogeneity These come from the Q statistic decomposition in NMA (often via the design-by-treatment interaction model). This article explains what they mean, why they exist, and how to interpret them in practice. 1. What Does “Design” Mean in This Con

Robust Approaches for Conventional Meta-Analysis and Network Meta-Analysis
Abstract Meta-analysis is a cornerstone of evidence synthesis in clinical and epidemiologic research. Traditional pairwise meta-analysis provides summary estimates of treatment effects by synthesizing results from studies that evaluate the same comparison. Network meta-analysis (NMA), in contrast, allows simultaneous comparison of multiple interventions by integrating both direct and indirect evidence. This article provides an overview of robust methods used to handle heterog

Concepts, Applications, and Implementation in Stata and R: Long vs. Wide Data
Concepts, Applications, and Implementation in Stata and R In data science and applied statistics, the structure of a dataset fundamentally affects how it can be analyzed, modeled, and visualized. Two dominant data structures are long form and wide form. Understanding the distinction between them is essential for efficient data management, especially when working with repeated measurements, panel data, surveys, or experiments. Definition of Wide-Form Data Wide-form data presen

Step 3 of the Debray Framework: Interpretation and Model Updating in External Validation
Introduction The final step of the Debray 3-step framework integrates insights from the earlier phases—population relatedness and predictive performance—to derive a clear, clinically meaningful interpretation of the model’s validity in the new setting. This step answers two essential questions: Does the observed performance reflect reproducibility or transportability ? If performance is suboptimal, what type of model updating is most appropriate? By combining distributional

Step 2 of the Debray Framework: Evaluating Calibration and Discrimination in External Validation
Introduction Once the relatedness between the development and validation populations has been established (Step 1), the next task in the Debray framework is to rigorously assess how well the original prediction model performs in the new validation sample . This step focuses on core predictive performance metrics— calibration and discrimination —accompanied by essential visual assessments. Together, these provide a comprehensive picture of predictive accuracy and potential mo

Step 1 of the Debray Framework: Investigating Relatedness in External Validation of Clinical Prediction Models
Introduction Before evaluating the predictive performance of a clinical prediction model in a new dataset, a critical prerequisite is determining how similar or different the validation population is compared with the development population. This first step— Investigating Relatedness —forms the foundation of the Debray 3-Step Framework for external validation. It clarifies what kind of external validity is being assessed: reproducibility or transportability . Why Relatedne

The Debray 3-Step Framework: A Modern Approach to Interpreting External Validation of Clinical Prediction Models
Introduction Clinical prediction models—diagnostic or prognostic—are designed to support decision-making by estimating the probability of disease presence, clinical deterioration, or future clinical outcomes. Yet their true value emerges only when they demonstrate reliable performance beyond the development dataset. External validation studies therefore play a central role in determining whether a model is reproducible, transportable, and ultimately, clinically useful. Despi

Why ROC/AUROC Is Not Enough: A Strategic Guide to Evaluating Clinical Prediction Models [ROC/AUROC → Calibration → Stability]
Abstract In clinical research, prediction models —whether diagnostic or prognostic—bridge data and decision-making. Yet, despite widespread reliance on ROC/AUROC as a performance benchmark, this single metric cannot guarantee clinical reliability or utility. As strategic research advisors, we must reframe model evaluation through multidimensional logic: discrimination, calibration, stability, and clinical usefulness . This article synthesizes the evaluative framework based o








