genieclust.inequality¶
Inequality measures
- genieclust.inequality.bonferroni_index(x, is_sorted=False)¶
Computes the normalised Bonferroni index
- Parameters:
- xndarray
A vector with non-negative elements.
- is_sortedbool
Indicates if x is already sorted increasingly.
- Returns:
- double
The value of the inequality index, a number in [0,1].
See also
genieclust.inequality.devergottini_index
The normalised De Vergottini index
genieclust.inequality.gini_index
The normalised Gini index
Notes
The normalised Bonferroni [1] index is given by:
\[B(x_1,\dots,x_n) = \frac{ \sum_{i=1}^{n} \left( n-\sum_{j=1}^i \frac{n}{n-j+1} \right) x_{\sigma(n-i+1)} }{ (n-1) \sum_{i=1}^n x_i },\]where \(\sigma\) is an ordering permutation of \((x_1,\dots,x_n)\).
Time complexity: \(O(n)\) for sorted data.
References
[1]Bonferroni C., Elementi di Statistica Generale, Libreria Seber, Firenze, 1930.
Examples
No inequality (perfect equality):
>>> round(genieclust.inequality.bonferroni_index(np.r_[2, 2, 2, 2, 2]), 2) 0.0
One has it all (total inequality):
>>> round(genieclust.inequality.bonferroni_index(np.r_[0, 0, 10, 0, 0]), 2) 1.0
Give to the poor, take away from the rich:
>>> round(genieclust.inequality.bonferroni_index(np.r_[7, 0, 3, 0, 0]), 2) 0.91
Robinhood even more:
>>> round(genieclust.inequality.bonferroni_index(np.r_[6, 0, 3, 1, 0]), 2) 0.83
- genieclust.inequality.devergottini_index(x, is_sorted=False)¶
Computes the normalised De Vergottini index
- Parameters:
- xndarray
A vector with non-negative elements.
- is_sortedbool
Indicates if x is already sorted increasingly.
- Returns:
- double
The value of the inequality index, a number in [0,1].
See also
genieclust.inequality.bonferroni_index
The normalised Bonferroni index
genieclust.inequality.gini_index
The normalised Gini index
Notes
The normalised De Vergottini index is given by:
\[\frac{1}{\sum_{i=2}^n \frac{1}{i}} \left( \frac{ \sum_{i=1}^n \left( \sum_{j=i}^{n} \frac{1}{j} \right) x_{\sigma(n-i+1)} }{\sum_{i=1}^{n} x_i} - 1 \right)\]where \(\sigma\) is an ordering permutation of \((x_1,\dots,x_n)\).
Time complexity is \(O(n)\) for sorted data.
Examples
No inequality (perfect equality):
>>> round(genieclust.inequality.devergottini_index(np.r_[2, 2, 2, 2, 2]), 2) 0.0
One has it all (total inequality):
>>> round(genieclust.inequality.devergottini_index(np.r_[0, 0, 10, 0, 0]), 2) 1.0
Give to the poor, take away from the rich:
>>> round(genieclust.inequality.devergottini_index(np.r_[7, 0, 3, 0, 0]), 2) 0.77
Robinhood even more:
>>> round(genieclust.inequality.devergottini_index(np.r_[6, 0, 3, 1, 0]), 2) 0.65
- genieclust.inequality.gini_index(x, is_sorted=False)¶
Computes the normalised Gini index
- Parameters:
- xndarray
A vector with non-negative elements.
- is_sortedbool
Indicates if x is already sorted increasingly.
- Returns:
- double
The value of the inequality index, a number in [0,1].
See also
genieclust.inequality.bonferroni_index
The normalised Bonferroni index
genieclust.inequality.devergottini_index
The normalised De Vergottini index
Notes
The normalised Gini [1] index is given by:
\[G(x_1,\dots,x_n) = \frac{ \sum_{i=1}^{n-1} \sum_{j=i+1}^n |x_i-x_j| }{ (n-1) \sum_{i=1}^n x_i }.\]Time complexity is \(O(n)\) for sorted data; it holds:
\[G(x_1,\dots,x_n) = \frac{ \sum_{i=1}^{n} (n-2i+1) x_{\sigma(n-i+1)} }{ (n-1) \sum_{i=1}^n x_i },\]where \(\sigma\) is an ordering permutation of \((x_1,\dots,x_n)\).
The Gini, Bonferroni, and De Vergottini indices can be used to quantify the “inequality” of a numeric sample. They can be conceived as normalised measures of data dispersion. For constant vectors (perfect equity), the indices yield values of 0. Vectors with all elements but one equal to 0 (perfect inequality), are assigned scores of 1. They follow the Pigou-Dalton principle (are Schur-convex): setting \(x_i = x_i - h\) and \(x_j = x_j + h\) with \(h > 0\) and \(x_i - h \geq x_j + h\) (taking from the “rich” and giving away to the “poor”) decreases the inequality.
These indices have applications in economics, amongst others. The Genie clustering algorithm uses the Gini index as a measure of the inequality of cluster sizes.
References
[1]Gini C., Variabilita e Mutabilita, Tipografia di Paolo Cuppini, Bologna, 1912.
Examples
No inequality (perfect equality):
>>> round(genieclust.inequality.gini_index(np.r_[2, 2, 2, 2, 2]), 2) 0.0
One has it all (total inequality):
>>> round(genieclust.inequality.gini_index(np.r_[0, 0, 10, 0, 0]), 2) 1.0
Give to the poor, take away from the rich:
>>> round(genieclust.inequality.gini_index(np.r_[7, 0, 3, 0, 0]), 2) 0.85
Robinhood even more:
>>> round(genieclust.inequality.gini_index(np.r_[6, 0, 3, 1, 0]), 2) 0.75