What Is Feature Importance in Random Forest? Gini vs Permutation Explained
- Mayta

- 2 days ago
- 3 min read
What is Feature Importance?
Feature importance answers the question:
“Which predictors contribute most to the model’s predictions?”
Importantly:
Feature importance does not change model performance
It is used for interpretation, especially in clinical research
Two main methods are used:
Impurity-based importance (Gini importance)
Permutation-based importance
Method 1: Impurity-Based Importance (Gini Importance)
Core Idea
Each time a feature is used to split a node, it reduces impurity (e.g., Gini).The total importance of a feature is:
Sum of all impurity reductions across all trees
How It Works
Across the forest:
Feature | Total impurity reduction | Normalized importance |
Age | High cumulative reduction | Highest importance |
GCS | Moderate reduction | Moderate importance |
SBP | Moderate reduction | Moderate importance |
HR | Small reduction | Low importance |
RR | Very small reduction | Lowest importance |
Interpretation
Property | Behavior |
Computation timing | During model training |
Mechanism | Tracks impurity reduction |
Speed | Fast |
Output meaning | “How often and how effectively a feature was used for splitting” |
Limitation: Systematic Bias
Bias toward high-cardinality features
Feature type | Behavior in impurity importance |
Many categories (e.g., hospital ID) | Artificially high importance |
Continuous variables | Favored |
Binary variables | Underestimated |
Why this happens
Features with more possible split points→ more opportunities to reduce impurity→ higher accumulated importance
Even if the feature is not clinically meaningful
Example Interpretation
Scenario | Result |
Hospital ID (many categories) | Appears highly important |
Age (continuous) | Appears important |
True clinical predictors | May be underestimated |
This leads to misleading conclusions in interpretation.
Method 2: Permutation-Based Importance
Core Idea
Instead of tracking splits, this method asks:
“If I destroy this feature’s information, how much worse does the model perform?”
How It Works
Step-by-step logic
Compute baseline model performance(e.g., AUROC = 0.850)
Randomly shuffle one feature (break its relationship with outcome)
Recompute performance
Importance = performance drop
Example Results
Feature | Baseline AUROC | Shuffled AUROC | Performance drop | Importance |
Age | 0.850 | 0.720 | 0.130 | High |
GCS | 0.850 | 0.740 | 0.110 | High |
SBP | 0.850 | 0.810 | 0.040 | Moderate |
HR | 0.850 | 0.830 | 0.020 | Low |
RR | 0.850 | 0.845 | 0.005 | Very low |
Interpretation
Property | Behavior |
Computation timing | After model training |
Mechanism | Measures performance degradation |
Speed | Slower |
Output meaning | “How much this feature contributes to prediction accuracy” |
Why Permutation Importance is Unbiased
Example 1: Many categories vs true signal
Feature | True predictive value | Impurity importance | Permutation importance |
Hospital ID | None | High (biased) | Near zero (correct) |
Age | Strong | Moderate | High (correct) |
Example 2: Continuous vs binary
Feature | Type | True effect | Impurity result | Permutation result |
Age | Continuous | Moderate | High (biased) | Moderate |
Sex | Binary | Moderate | Low (biased) | Moderate |
Key Insight
Permutation importance reflects:
Actual contribution to predictive performance
not:
Opportunity to create splits
Side-by-Side Comparison
Aspect | Impurity-Based Importance | Permutation-Based Importance |
When computed | During training | After training |
Mechanism | Sum of impurity reductions | Performance drop after shuffling |
Speed | Fast | Slower |
Bias | Biased (toward many categories/continuous) | Unbiased |
Interpretation | Model usage frequency | True predictive contribution |
Use case | Quick exploration | Clinical reporting and publication |
Clinical Interpretation Perspective
In clinical prediction modeling:
The goal is not:
“Which variable is used often?”
The goal is:
“Which variable actually improves prediction?”
Permutation importance aligns with this goal.
This is consistent with prediction modeling principles:
Focus on predictive contribution, not structural artifacts
Avoid misleading interpretations due to model mechanics
Practical Recommendation for Clinical Papers
Do not rely on impurity-based importance for interpretation
Use permutation-based importance for reporting
Reason:
It reflects real impact on model performance
It is methodologically defensible
It aligns with clinical reasoning
Conceptual Summary
Feature importance methods differ in what they measure:
Impurity-based importance → how often a feature is used
Permutation-based importance → how much a feature matters
Key Takeaways
Feature importance does not affect model performance, only interpretation
Impurity-based importance is fast but biased toward certain variable types
Permutation-based importance measures true contribution to prediction
In clinical research, permutation importance is preferred for validity
Misinterpreting feature importance can lead to incorrect clinical conclusions



Comments