genieclust.internal¶
Internal functions and classes

class
genieclust.internal.
DisjointSets
¶ Disjoint Sets (UnionFind)
 Parameters
 nssize_t
The cardinality of the set whose partitions are generated.
Notes
Represents a partition of the set \(\{0,1,...,n1\}\) for some \(n\).
Path compression for find() is implemented, but the union() operation is naive (neither it is union by rank nor by size), see https://en.wikipedia.org/wiki/Disjointset_data_structure. This is by design, as some other operations in the current package rely on the assumption that the parent ID of each element is always <= than itself.

find
()¶ Finds the subset ID for a given x
 Parameters
 xssize_t
An integer in {0,…,n1}, representing an element to find.
 Returns
 ssize_t
The ID of the parent of x.

get_k
()¶ Returns the current number of subsets

get_n
()¶ Returns the number of elements in the set being partitioned

to_list
()¶ Gets parent IDs of all the elements
 Returns
 ndarray, shape (n,)
A list
m
such thatm[x]
denotes the (recursive) parent ID of x, for x = 0,1,…,n1.

to_list_normalized
()¶ Gets the normalised elements’ membership information
 Returns
 ndarray, shape (n,)
A list
m
such thatm[x]
denotes the normalised parent ID of x. The resulting values are in {0,1,…,k1}, where k is the current number of subsets in the partition.

to_lists
()¶ Returns a list of lists representing the current partition
 Returns
 list of lists
A list of length k, where k is the current number of sets in the partition. Each list element is a list with values in {0,…,n1}.
Notes
This is a slow operation. Do you really need it?

union
()¶ Merges the sets containing given x and y
 Parameters
 xssize_t
Integer in {0,…,n1}, representing an element of the first set to be merged.
 yssize_t
Integer in {0,…,n1}, representing an element of the second set to be merged.
 Returns
 parentssize_t
The id of the parent of x or y, whichever is smaller.
Notes
Let px be the parent ID of x, and py be the parent ID of y. If px < py, then the new parent ID of py will be set to py. Otherwise, px will have py as its parent.
If x and y are members of the same subset, an exception is thrown.

class
genieclust.internal.
GiniDisjointSets
¶ Augmented disjoint sets (UnionFind) over {0,1,…,n1}
 Parameters
 nssize_t
The cardinality of the set whose partitions are generated.
Notes
The class allows to compute the normalised Gini index of the distribution of subset sizes, i.e., \(G(x_1,\dots,x_k) = \frac{\sum_{i=1}^{n1} \sum_{j=i+1}^n x_ix_j}{ (n1) \sum_{i=1}^n x_i}\), where \(x_i\) is the number of elements in the ith subset in the current partition.
For a use case, see: Gagolewski M., Bartoszuk M., Cena A., Genie: A new, fast, and outlierresistant hierarchical clustering algorithm, Information Sciences 363, 2016, pp. 823. doi:10.1016/j.ins.2016.05.003

find
()¶ Finds the subset ID for a given x
 Parameters
 xssize_t
An integer in {0,…,n1}, representing an element to find.
 Returns
 ssize_t
The ID of the parent of x.

get_count
()¶ Returns the size of the subset containing x
 Parameters
 xssize_t
An integer in {0,…,n1}, representing an element to find.
Notes
Run time: the cost of find(x)

get_counts
()¶ Generates an array of subsets’ sizes
Notes
The resulting vector is ordered nondecreasingly.
Run time is \(O(k)\), where k is the current number of subsets.

get_gini
()¶ Returns the Gini index of the distribution of subsets’ sizes
Notes
Run time is \(O(1)\), as the Gini index is updated during each call to union().

get_k
()¶ Returns the current number of subsets

get_n
()¶ Returns the number of elements in the set being partitioned

get_smallest_count
()¶ Returns the size of the smallest subset
Notes
Run time is O(1).

to_list
()¶ Gets parent IDs of all the elements
 Returns
 ndarray, shape (n,)
A list
m
such thatm[x]
denotes the (recursive) parent ID of x, for x = 0,1,…,n1.

to_list_normalized
()¶ Gets the normalised elements’ membership information
 Returns
 ndarray, shape (n,)
A list
m
such thatm[x]
denotes the normalised parent ID of x. The resulting values are in {0,1,…,k1}, where k is the current number of subsets in the partition.

to_lists
()¶ Returns a list of lists representing the current partition
 Returns
 list of lists
A list of length k, where k is the current number of sets in the partition. Each list element is a list with values in {0,…,n1}.
Notes
This is a slow operation. Do you really need it?

union
()¶ Merges the sets containing given x and y
 Parameters
 xssize_t
Integer in {0,…,n1}, representing an element of the first set to be merged.
 yssize_t
Integer in {0,…,n1}, representing an element of the second set to be merged.
 Returns
 parentssize_t
The id of the parent of x or y, whichever is smaller.
Notes
Let px be the parent ID of x, and py be the parent ID of y. If px < py, then the new parent ID of py will be set to py. Otherwise, px will have py as its parent.
If x and y are members of the same subset, an exception is thrown.
Update time: pessimistically \(O(\sqrt{n})\), as the Gini index must be recomputed.