top of page

หากหาเนื้อหาไม่เจอใช้นี่สิ กดที่ลิงค์รูปภาพทางขวามือใช้ Search Engine powered by chat GPT เพื่อหาบทความอย่างว่องไวและถูกต้องที่สุดได้ทันที


If you can’t find the content, use this! Click the image link on the right to use the Search Engine powered by ChatGPT to quickly and accurately find articles instantly.

Uniqcret.com Search engine AI in Blog.png

What Is Feature Importance in Random Forest? Gini vs Permutation Explained

What is Feature Importance? Feature importance answers the question: “Which predictors contribute most to the model’s predictions?” Importantly: Feature importance does not change model performance It is used for interpretation , especially in clinical research Two main methods are used: Impurity-based importance (Gini importance) Permutation-based importance Method 1: Impurity-Based Importance (Gini Importance) Core Idea Each time a feature is used to split a node, it reduce

What Is the Split Rule (Discrimination Rule) in Random Forest? Gini vs Extra Trees Explained

What is the Split Rule? At each node in a decision tree, the algorithm must decide: “Where should I split this feature to best separate the outcome?” This decision is governed by the split rule (criterion) . In Random Forest, the most common split rules are: Gini impurity (standard Random Forest) Extremely Randomized Trees (Extra Trees) The key difference lies in how the split threshold is chosen . The Core Difference: How a Split Point is Chosen Consider a single feature: Fe

How Random Forest Hyperparameters Affect Model Performance

Random Forest performance is driven by three core mechanisms: Tree strength (how well each tree fits the data) Tree diversity (how different trees are from each other) Ensemble averaging (how predictions stabilize across trees) Each parameter influences one or more of these mechanisms. Category 1: Tree Structure Parameters (Most Important for Performance) These parameters control how each individual tree grows and directly affect the bias–variance trade-off. 1. Features per s

ROC Analysis and Diagnostic Test Accuracy: From Discrimination to Cut-Point Selection in Stata (roctab & diagt)

Introduction (When the outcome is disease status and the goal is test performance) In diagnostic research, we are interested in how well a test distinguishes between patients with and without disease. We are not only interested in whether a test is associated with disease, but how accurately it classifies patients , and whether this accuracy depends on the chosen cut-point . 1. ROC Analysis (Discrimination & Cut-point Exploration) (Describe performance across all thresholds

Survival Analysis in Clinical Epidemiology Stata Code dominant: From Kaplan–Meier to Cox Regression (Non-parametric, Semi-parametric)

Introduction Survival analysis is used when time matters . We are not only interested in whether an event happens, but also when it happens, and we must correctly handle censoring (patients who do not experience the event during follow-up). 1. Non-parametric Survival Analysis (Describe and compare survival — no model assumptions) Step 1: Survival-time setting (stset) Before doing anything, we must tell Stata: What is time What is event Who is censored stset time, failure(e

Choosing RR or OR in Epidemiologic Studies: cs vs cc Commands in Stata

If you remember one rule , remember this: cs → cohort/RCT/cross-sectional when you can interpret risk (or prevalence) → gives RR/RD cc → case-control (sampled by outcome) → gives OR (only) (Stata even labels this as “Risk & Odds Analysis” with cs for RR and logistic for OR in the quick reference.) The decision flowchart How were subjects selected? A) Selected by EXPOSURE status (exposed/unexposed) and followed (or measured) outcome? -> Cohort / RCT / cross-sectional without

Incidence Rate [IR] and Incidence Rate Ratio [IRR]: Analysis of Rates in Clinical Epidemiology Using Stata

1. Introduction In clinical epidemiology, outcomes often occur over time , and individuals may contribute different lengths of follow-up . In such settings, simple risks or proportions are inadequate. Instead, incidence rates (IRs) and incidence rate ratios (IRRs) are appropriate measures of disease incidence and exposure effects. Stata’s ir command is designed for this purpose and is widely used in cohort studies, occupational epidemiology, pharmacoepidemiology, and regis

Clinical Trial Lifecycle Explained: From Protocol Development to SAP and CSR

1) Developing Protocol + Sample Size A. Scientific + clinical foundation Define study rationale (unmet need, mechanism, prior evidence, feasibility). Translate to objectives : Primary objective (one “win condition”) Key secondary objectives (ranked) Exploratory objectives (biomarkers, PROs, substudies) Define endpoint strategy Primary endpoint (precise definition, timepoint, ascertainment) Secondary endpoints (hierarchy / multiplicity plan) Safety endpoints (AEs, SAEs, AESIs

Binreg in Stata: Odds Ratios, Risk Ratios, and Why Modified Poisson Is Preferred

1. Introduction Binary outcomes are common in clinical and epidemiological research. Examples include disease status (yes/no), mortality (dead/alive), or treatment response (success/failure). In Stata, several commands can be used to analyze binary outcomes, including logistic, binreg, and glm with different families and links. Although these commands may appear similar, they estimate different effect measures, rely on different assumptions, and can behave very differently in

Epitab in Stata: Classical Epidemiologic Analysis with 2×2 (confusion matrix) and Stratified Tables

Overview Epitab is a suite of Stata commands designed for classical epidemiologic analyses based on 2×2 and stratified tables . It provides design-consistent estimation of effect measures, confidence intervals, and attributable fractions across cohort , case–control , cross-sectional , and matched study designs. Unlike regression models (e.g., logistic, poisson, stcox), Epitab commands are table-based , transparent, and ideal for: Crude and stratified analyses Teaching epi

Blog: Blog2
bottom of page