Entropy vs Gini Impurity Comparison¶
About This MicroSim¶
This interactive visualization demonstrates the two most common splitting criteria used in decision tree algorithms: entropy (information gain) and Gini impurity. Both measures quantify the "impurity" or "disorder" of a node based on its class distribution.
How to Use¶
- Adjust Class Distribution: Move the slider to change the proportion of Class 1 (0% to 100%)
- Observe Curves: Watch how both entropy and Gini curves respond to the distribution
- Show Good Split: Click to see an example split with high information gain
- Show Bad Split: Click to see a split that doesn't improve purity
- Show Split Comparison: Check the box to overlay parent and children impurities
Key Concepts¶
- Entropy: Measures disorder using information theory, calculated as H = -Σ pᵢ log₂(pᵢ)
- Gini Impurity: Measures the probability of misclassification, calculated as 1 - Σ pᵢ²
- Maximum Impurity: Both measures peak at 50/50 class distribution (maximum uncertainty)
- Pure Nodes: Both measures equal 0 when all samples belong to one class (perfect certainty)
- Information Gain: Reduction in impurity achieved by a split (higher is better)
Educational Value¶
This visualization helps students understand:
- How impurity measures quantify node purity/disorder
- Why both metrics favor balanced vs pure splits
- The relationship between class distribution and impurity
- How decision trees select optimal splits (maximize information gain)
- Why entropy and Gini often produce similar trees despite different formulas
Learning Objectives¶
Bloom's Taxonomy Level: Analyze (L4)
After using this MicroSim, students should be able to:
- Calculate entropy and Gini impurity for a given class distribution
- Explain why maximum impurity occurs at 50/50 distribution
- Analyze the difference between good and bad splits using impurity reduction
- Understand why decision trees prefer splits with high information gain
- Compare when to use entropy vs Gini in practice
Technical Details¶
- Library: p5.js
- Responsive: Fixed canvas size (900x700)
- Interactivity: Slider, buttons, checkbox controls
- Features: Side-by-side comparison, split demonstrations, real-time calculations
Integration¶
To embed this MicroSim in your course materials: