genieclust.tools

Functions one might find useful, but not necessarily

genieclust.tools.argkmin(x, k)

Finds the position of an order statistic

Parameters:
xndarray

A vector with n elements of type int, float, or double.

kint

An integer between 0 and n - 1, preferably small.

Returns:
int

The index where the (k-1)-th smallest value in x is located.

Notes

It holds argkmin(x, 0) == argmin(x), or, more generally, argkmin(x, k) == np.argsort(x)[k].

Run time is O(nk) and working memory is O(k). An insertion sort-like scheme is used to locate the order statistic. In practice, the function is very fast for small k and randomly ordered or almost sorted (increasingly) data.

Example timings

argkmin(x, k)

np.argsort(x)[k]

(ascending) n= 100000000, k= 1:

0.060s

4.388s

(descending)

0.168s

7.329s

(random)

0.073s

26.673s

(ascending) n= 100000000, k= 5:

0.060s

4.403s

(descending)

0.505s

7.414s

(random)

0.072s

26.447s

(ascending) n= 100000000, k= 100:

0.061s

4.390s

(descending)

8.007s

7.639s

(random)

0.075s

27.299s

Examples

>>> x = np.r_[2, 3, 6, 5, 1, 4]
>>> genieclust.tools.argkmin(x, 0) # index of the smallest value
4
>>> genieclust.tools.argkmin(x, 1) # index of the 2nd smallest value
0
genieclust.tools.cummax(x)

Cumulative maximum

Parameters:
xndarray

A vector with n elements of type int, float, or double.

Returns:
ndarray

A vector of length n whose i-th element is equal to max(x[:i]).

See also

genieclust.tools.cummin

Cumulative minimum

Examples

>>> genieclust.tools.cummax(np.r_[3, 2, 1, 4, 6, 5])
array([3, 3, 3, 4, 6, 6])
genieclust.tools.cummin(x)

Cumulative minimum

Parameters:
xndarray

A vector with n elements of type int, float, or double.

Returns:
ndarray

A vector of length n whose i-th element is equal to min(x[:i]).

See also

genieclust.tools.cummax

Cumulative maximum

Examples

>>> genieclust.tools.cummin(np.r_[3, 4, 2, 1, 5, 6])
array([3, 3, 2, 1, 1, 1])