
What Is Feature Importance in Random Forest? Gini vs Permutation Explained
What is Feature Importance? Feature importance answers the question: “Which predictors contribute most to the model’s predictions?” Importantly: Feature importance does not change model performance It is used for interpretation , especially in clinical research Two main methods are used: Impurity-based importance (Gini importance) Permutation-based importance Method 1: Impurity-Based Importance (Gini Importance) Core Idea Each time a feature is used to split a node, it reduce

What Is the Split Rule (Discrimination Rule) in Random Forest? Gini vs Extra Trees Explained
What is the Split Rule? At each node in a decision tree, the algorithm must decide: “Where should I split this feature to best separate the outcome?” This decision is governed by the split rule (criterion) . In Random Forest, the most common split rules are: Gini impurity (standard Random Forest) Extremely Randomized Trees (Extra Trees) The key difference lies in how the split threshold is chosen . The Core Difference: How a Split Point is Chosen Consider a single feature: Fe

How Random Forest Hyperparameters Affect Model Performance
Random Forest performance is driven by three core mechanisms: Tree strength (how well each tree fits the data) Tree diversity (how different trees are from each other) Ensemble averaging (how predictions stabilize across trees) Each parameter influences one or more of these mechanisms. Category 1: Tree Structure Parameters (Most Important for Performance) These parameters control how each individual tree grows and directly affect the bias–variance trade-off. 1. Features per s

AIC Akaike Information Criterion and BIC Bayesian Information Criterion in Logistic Regression
Your output: Model | N ll(null) ll(model) df AIC BIC -------------------------------------------------------------- . | 3135 -1906.079 -1807.527 2 3619.054 3631.155 What Are AIC and BIC? Both are information criteria used to compare models. They answer: Which model balances goodness-of-fit AND parsimony best? They penalize complexity. AIC stands for: Akaike Information Criterion Named after: Hirotugu Akaike (1974) BIC stands for: Bayesian

ROC Analysis and Diagnostic Test Accuracy: From Discrimination to Cut-Point Selection in Stata (roctab & diagt)
Introduction (When the outcome is disease status and the goal is test performance) In diagnostic research, we are interested in how well a test distinguishes between patients with and without disease. We are not only interested in whether a test is associated with disease, but how accurately it classifies patients , and whether this accuracy depends on the chosen cut-point . 1. ROC Analysis (Discrimination & Cut-point Exploration) (Describe performance across all thresholds

Survival Analysis in Clinical Epidemiology Stata Code dominant: From Kaplan–Meier to Cox Regression (Non-parametric, Semi-parametric)
Introduction Survival analysis is used when time matters . We are not only interested in whether an event happens, but also when it happens, and we must correctly handle censoring (patients who do not experience the event during follow-up). 1. Non-parametric Survival Analysis (Describe and compare survival — no model assumptions) Step 1: Survival-time setting (stset) Before doing anything, we must tell Stata: What is time What is event Who is censored stset time, failure(e


















