genieclust.inequality

Inequality measures

genieclust.inequality.bonferroni_index(x, is_sorted=False)

Computes the normalised Bonferroni index

Parameters:
xndarray

A vector with non-negative elements.

is_sortedbool

Indicates if x is already sorted increasingly.

Returns:
double

The value of the inequality index, a number in [0,1].

See also

genieclust.inequality.devergottini_index

The normalised De Vergottini index

genieclust.inequality.gini_index

The normalised Gini index

Notes

The normalised Bonferroni [1] index is given by:

\[B(x_1,\dots,x_n) = \frac{ \sum_{i=1}^{n} \left( n-\sum_{j=1}^i \frac{n}{n-j+1} \right) x_{\sigma(n-i+1)} }{ (n-1) \sum_{i=1}^n x_i },\]

where \(\sigma\) is an ordering permutation of \((x_1,\dots,x_n)\).

Time complexity: \(O(n)\) for sorted data.

References

[1]

Bonferroni C., Elementi di Statistica Generale, Libreria Seber, Firenze, 1930.

Examples

No inequality (perfect equality):

>>> round(genieclust.inequality.bonferroni_index(np.r_[2, 2,  2, 2, 2]), 2)
0.0

One has it all (total inequality):

>>> round(genieclust.inequality.bonferroni_index(np.r_[0, 0, 10, 0, 0]), 2)
1.0

Give to the poor, take away from the rich:

>>> round(genieclust.inequality.bonferroni_index(np.r_[7, 0,  3, 0, 0]), 2)
0.91

Robinhood even more:

>>> round(genieclust.inequality.bonferroni_index(np.r_[6, 0,  3, 1, 0]), 2)
0.83
genieclust.inequality.devergottini_index(x, is_sorted=False)

Computes the normalised De Vergottini index

Parameters:
xndarray

A vector with non-negative elements.

is_sortedbool

Indicates if x is already sorted increasingly.

Returns:
double

The value of the inequality index, a number in [0,1].

See also

genieclust.inequality.bonferroni_index

The normalised Bonferroni index

genieclust.inequality.gini_index

The normalised Gini index

Notes

The normalised De Vergottini index is given by:

\[\frac{1}{\sum_{i=2}^n \frac{1}{i}} \left( \frac{ \sum_{i=1}^n \left( \sum_{j=i}^{n} \frac{1}{j} \right) x_{\sigma(n-i+1)} }{\sum_{i=1}^{n} x_i} - 1 \right)\]

where \(\sigma\) is an ordering permutation of \((x_1,\dots,x_n)\).

Time complexity is \(O(n)\) for sorted data.

Examples

No inequality (perfect equality):

>>> round(genieclust.inequality.devergottini_index(np.r_[2, 2,  2, 2, 2]), 2)
0.0

One has it all (total inequality):

>>> round(genieclust.inequality.devergottini_index(np.r_[0, 0, 10, 0, 0]), 2)
1.0

Give to the poor, take away from the rich:

>>> round(genieclust.inequality.devergottini_index(np.r_[7, 0,  3, 0, 0]), 2)
0.77

Robinhood even more:

>>> round(genieclust.inequality.devergottini_index(np.r_[6, 0,  3, 1, 0]), 2)
0.65
genieclust.inequality.gini_index(x, is_sorted=False)

Computes the normalised Gini index

Parameters:
xndarray

A vector with non-negative elements.

is_sortedbool

Indicates if x is already sorted increasingly.

Returns:
double

The value of the inequality index, a number in [0,1].

See also

genieclust.inequality.bonferroni_index

The normalised Bonferroni index

genieclust.inequality.devergottini_index

The normalised De Vergottini index

Notes

The normalised Gini [1] index is given by:

\[G(x_1,\dots,x_n) = \frac{ \sum_{i=1}^{n-1} \sum_{j=i+1}^n |x_i-x_j| }{ (n-1) \sum_{i=1}^n x_i }.\]

Time complexity is \(O(n)\) for sorted data; it holds:

\[G(x_1,\dots,x_n) = \frac{ \sum_{i=1}^{n} (n-2i+1) x_{\sigma(n-i+1)} }{ (n-1) \sum_{i=1}^n x_i },\]

where \(\sigma\) is an ordering permutation of \((x_1,\dots,x_n)\).

The Gini, Bonferroni, and De Vergottini indices can be used to quantify the “inequality” of a numeric sample. They can be conceived as normalised measures of data dispersion. For constant vectors (perfect equity), the indices yield values of 0. Vectors with all elements but one equal to 0 (perfect inequality), are assigned scores of 1. They follow the Pigou-Dalton principle (are Schur-convex): setting \(x_i = x_i - h\) and \(x_j = x_j + h\) with \(h > 0\) and \(x_i - h \geq x_j + h\) (taking from the “rich” and giving away to the “poor”) decreases the inequality.

These indices have applications in economics, amongst others. The Genie clustering algorithm uses the Gini index as a measure of the inequality of cluster sizes.

References

[1]

Gini C., Variabilita e Mutabilita, Tipografia di Paolo Cuppini, Bologna, 1912.

Examples

No inequality (perfect equality):

>>> round(genieclust.inequality.gini_index(np.r_[2, 2,  2, 2, 2]), 2)
0.0

One has it all (total inequality):

>>> round(genieclust.inequality.gini_index(np.r_[0, 0, 10, 0, 0]), 2)
1.0

Give to the poor, take away from the rich:

>>> round(genieclust.inequality.gini_index(np.r_[7, 0,  3, 0, 0]), 2)
0.85

Robinhood even more:

>>> round(genieclust.inequality.gini_index(np.r_[6, 0,  3, 1, 0]), 2)
0.75