top of page

How Random Forest Hyperparameters Affect Model Performance

  • Writer: Mayta
    Mayta
  • 2 days ago
  • 4 min read

Random Forest performance is driven by three core mechanisms:

  • Tree strength (how well each tree fits the data)

  • Tree diversity (how different trees are from each other)

  • Ensemble averaging (how predictions stabilize across trees)

Each parameter influences one or more of these mechanisms.

Category 1: Tree Structure Parameters (Most Important for Performance)

These parameters control how each individual tree grows and directly affect the bias–variance trade-off.

1. Features per split (mtry / maximum features)

Low number of features

High number of features

Few variables considered at each split

Many variables considered

High randomness

Low randomness

Trees very different (low correlation)

Trees similar (high correlation)

Higher bias

Lower bias

Lower variance (better ensemble effect)

Higher variance (less benefit from averaging)

Interpretation: This parameter controls how similar the trees are to each other. It is the most important tuning parameter because reducing correlation between trees greatly improves ensemble performance.

2. Minimum node size (min.node.size / minimum samples per leaf)

Small minimum node size

Large minimum node size

Very small terminal nodes

Larger terminal nodes

Highly complex trees

Simpler trees

Captures fine details (including noise)

Produces smoother predictions

Lower bias

Higher bias

Higher variance (overfitting risk)

Lower variance (underfitting risk)

Interpretation: This is the primary parameter that controls overfitting. Smaller values allow the model to memorize data; larger values force it to generalize.

3. Maximum tree depth (maximum depth)

Shallow trees

Deep trees

Limited interactions captured

Complex interactions captured

Higher bias

Lower bias

Lower variance

Higher variance

Risk of underfitting

Potential overfitting

Interpretation: In Random Forest, trees are usually allowed to grow fully because bagging already controls overfitting. Limiting depth is rarely necessary.

4. Minimum samples required to split (minimum samples to split)

Small minimum split size

Large minimum split size

Splits occur easily

Splits occur less often

More complex trees

Simpler trees

Lower bias

Higher bias

Higher variance

Lower variance

Interpretation: This parameter influences when splitting is allowed, but has less impact than minimum node size.

5. Maximum number of leaf nodes (maximum leaf nodes)

Few leaf nodes

Many leaf nodes

Strong restriction on tree size

Minimal restriction

Simpler trees

More complex trees

Higher bias

Lower bias

Lower variance

Higher variance

Interpretation: Another way to limit tree complexity. Its role overlaps with the minimum node size.

Category 2: Ensemble Parameters

These parameters control how multiple trees are built and combined.

6. Number of trees (number of estimators)

Few trees

Many trees

Unstable predictions

Stable predictions

Higher variance

Lower variance

Sensitive to sampling noise

Robust averaging

Faster computation

Slower computation

Interpretation: Increasing the number of trees reduces variance and stabilizes predictions. Performance improves until it plateaus (typically around 300–500 trees).

7. Sample fraction (fraction of data per tree)

Small sample fraction

Large sample fraction

Each tree sees less data

Each tree sees more data

Higher diversity between trees

Lower diversity

Higher bias

Lower bias

Lower variance (better ensemble effect)

Higher variance (trees are more similar)

Interpretation: Controls how similar trees are. Smaller fractions increase diversity and can reduce overfitting.

8. Bootstrap sampling (sampling with replacement)

Without replacement

With replacement

More unique observations per tree

Some observations repeated

Trees more similar

Trees more diverse

Slightly higher variance

Lower variance

Interpretation: Standard Random Forest uses sampling with replacement to increase variability across trees.

9. Class weighting (class weights)

No class weighting

Balanced class weighting

Majority class dominates learning

Minority class emphasized

Higher overall accuracy

Better minority detection

Lower sensitivity for rare events

Higher sensitivity for rare events

Better probability calibration

Possible distortion of predicted probabilities

Interpretation: This parameter changes how the model prioritizes errors. In clinical prediction, recalibration is often preferred over weighting when probability estimates are important.

Category 3: Feature Handling Parameters

These parameters control how features are evaluated during splitting.

10. Split rule (criterion for split quality)

Standard split rule (e.g., Gini)

More random split rule (e.g., Extra Trees)

Deterministic optimal splits

Randomized split points

Lower bias

Slightly higher bias

Higher variance

Lower variance

Stable performance

More randomness

Interpretation: The choice of the split rule has a relatively small impact. Standard methods such as Gini impurity work well in most situations.

11. Feature importance method

Impurity-based importance

Permutation-based importance

Fast to compute

Slower to compute

Biased toward variables with many categories

Unbiased

Less reliable for interpretation

More reliable for interpretation

Interpretation: This does not affect model performance but is critical for interpreting predictors.

Category 4: Computational / Reproducibility Parameters

These parameters do not affect model performance but ensure reproducibility and efficiency.

12. Random seed (random state)

Not fixed

Fixed

Results vary between runs

Results reproducible

Hard to debug

Consistent outputs

Interpretation: Always fix a random seed to ensure reproducibility.

13. Number of parallel threads (number of jobs)

Single thread

Multiple threads

Slower training

Faster training

Lower resource use

Higher CPU usage

Interpretation: Controls computation speed only.

14. Verbose output (verbosity level)

Low verbosity

High verbosity

Minimal output

Detailed progress logs

Cleaner console

More transparency

Interpretation: Useful for monitoring training progress.

15. Warm start (incremental tree building)

Disabled

Enabled

Model trained from scratch

Trees added incrementally

Simpler workflow

Flexible model expansion

Interpretation: Allows adding more trees without retraining from the beginning.

16. Out-of-bag error estimation (OOB score)

Disabled

Enabled

No internal validation

Built-in validation using unused samples

Requires external validation

Quick performance estimate

Interpretation: Provides an internal estimate of model performance without separate validation data.

Final Conceptual Summary

Random Forest performance depends on balancing:

  • Strong trees (deep, small leaf size, high feature usage)

  • Diverse trees (low feature usage, smaller sample fraction, bootstrapping)

  • Stable averaging (large number of trees)

The most influential parameters are:

  1. Features per split (controls correlation between trees)

  2. Minimum node size (controls overfitting)

  3. Sample fraction (secondary diversity control)


Key Takeaways

  • Features per split is the most important parameter because it controls tree correlation

  • Minimum node size is the main control for overfitting

  • Number of trees stabilizes predictions and should be set sufficiently large

  • Sample fraction can further improve diversity between trees

  • Most other parameters have smaller or redundant effects

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page