Gini impurity
Gini impurity is a measure used in decision tree algorithms to evaluate the quality of a split in a dataset. It quantifies how often a randomly chosen element would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. The value ranges from 0 to 1, where 0 indicates perfect purity (all elements belong to a single class) and 1 indicates maximum impurity (elements are evenly distributed across classes).
To calculate Gini impurity, you take the sum of the squared probabilities of each class and subtract it from 1. A lower Gini impurity value suggests a better split, leading to more homogeneous groups. This concept is crucial in machine learning, particularly in algorithms like CART (Classification and Regression Trees).