genieclust.inequity¶

Inequity (inequality) measures

genieclust.inequity.bonferroni_index(x, is_sorted=False)

Computes the normalised Bonferroni index

Parameters
xndarray

A vector with non-negative elements.

is_sortedbool

Indicates if x is already sorted increasingly.

Returns
double

The value of the inequity index, a number in [0,1].

genieclust.inequity.gini_index

The normalised Gini index

Notes

The normalised Bonferroni  index is given by:

$B(x_1,\dots,x_n) = \frac{ \sum_{i=1}^{n} \left( n-\sum_{j=1}^i \frac{n}{n-j+1} \right) x_{\sigma(n-i+1)} }{ (n-1) \sum_{i=1}^n x_i },$

where $$\sigma$$ is an ordering permutation of $$(x_1,\dots,x_n)$$.

Time complexity: $$O(n)$$ for sorted data.

References

1

Bonferroni C., Elementi di Statistica Generale, Libreria Seber, Firenze, 1930.

Examples

No inequality (perfect equality):

>>> round(genieclust.inequity.bonferroni_index(np.r_[2, 2,  2, 2, 2]), 2)
0.0

One has it all (total inequity):

>>> round(genieclust.inequity.bonferroni_index(np.r_[0, 0, 10, 0, 0]), 2)
1.0

Give to the poor, take away from the rich:

>>> round(genieclust.inequity.bonferroni_index(np.r_[7, 0,  3, 0, 0]), 2)
0.91

Robinhood even more:

>>> round(genieclust.inequity.bonferroni_index(np.r_[6, 0,  3, 1, 0]), 2)
0.83
genieclust.inequity.gini_index(x, is_sorted=False)

Computes the normalised Gini index

Parameters
xndarray

A vector with non-negative elements.

is_sortedbool

Indicates if x is already sorted increasingly.

Returns
double

The value of the inequity index, a number in [0,1].

genieclust.inequity.bonferroni_index

The normalised Bonferroni index

Notes

The normalised Gini  index is given by:

$G(x_1,\dots,x_n) = \frac{ \sum_{i=1}^{n-1} \sum_{j=i+1}^n |x_i-x_j| }{ (n-1) \sum_{i=1}^n x_i }.$

Time complexity is $$O(n)$$ for sorted data; it holds:

$G(x_1,\dots,x_n) = \frac{ \sum_{i=1}^{n} (n-2i+1) x_{\sigma(n-i+1)} }{ (n-1) \sum_{i=1}^n x_i },$

where $$\sigma$$ is an ordering permutation of $$(x_1,\dots,x_n)$$.

Both the Gini and Bonferroni indices can be used to quantify the “inequity” of a numeric sample. They can be perceived as measures of data dispersion. For constant vectors (perfect equity), the indices yield values of 0. Vectors with all elements but one equal to 0 (perfect inequity), are assigned scores of 1. Both indices follow the Pigou-Dalton principle (are Schur-convex): setting $$x_i = x_i - h$$ and $$x_j = x_j + h$$ with $$h > 0$$ and $$x_i - h \geq x_j + h$$ (taking from the “rich” and giving away to the “poor”) decreases the inequity.

These indices have applications in economics, amongst others. The Genie clustering algorithm uses the Gini index as a measure of the inequality of cluster sizes.

References

1

Gini C., Variabilita e Mutabilita, Tipografia di Paolo Cuppini, Bologna, 1912.

Examples

No inequality (perfect equality):

>>> round(genieclust.inequity.gini_index(np.r_[2, 2,  2, 2, 2]), 2)
0.0

One has it all (total inequity):

>>> round(genieclust.inequity.gini_index(np.r_[0, 0, 10, 0, 0]), 2)
1.0

Give to the poor, take away from the rich:

>>> round(genieclust.inequity.gini_index(np.r_[7, 0,  3, 0, 0]), 2)
0.85

Robinhood even more:

>>> round(genieclust.inequity.gini_index(np.r_[6, 0,  3, 1, 0]), 2)
0.75