Changelog¶
To Do¶
Check for NA/NaN/Inf in input matrices.
Bring back support for non-numeric data (needs updates in
deadwood).Add support for non-square confusion matrices in
normalized_pivoted_accuracyandnormalized_clustering_accuracy.
1.3.0 (2026-02-23)¶
The package was heavily refactored; common MST-related functions and classes as well as functions from the
toolsandplotsmodules were moved to the newdeadwoodpackage, which is now required.[BACKWARD INCOMPATIBILITY] Outlier detection based solely on whether a node is a leaf of a minimum spanning tree w.r.t. some mutual reachability distance turned out to be subpar in more detailed experiments, especially for smaller smoothing factors. Note that in the previous versions of the package, this feature was deemed merely experimental; Hence,
detect_noiseingenie.defaultandskip_leaves,preprocess, andpostprocesselsewhere are no longer available. Instead, use the more universaldeadwoodpackage now.[BACKWARD INCOMPATIBILITY]
quitefastmstversion >= 0.9.1 is now required; the introduced backward-incompatible changes have been addressed. In particular, the definition of mutual reachability distances has changed. Unlike in Campello et al.’s 2013 paper, now the core distance is the distance to the M-th nearest neighbour, not the (M-1)-th one (not including self).[Python] [BACKWARD INCOMPATIBILITY]
internalmodule was renamedcore.[BACKWARD INCOMPATIBILITY] Deprecated functions such as
mst_from_nnhave been removed.[Python] [BACKWARD INCOMPATIBILITY]
compute_full_treeis now always True.[BUGFIX] #92: Passing a non-square confusion matrix to
normalized_pivoted_accuracyandnormalized_clustering_accuracyyields an error as such objects are yet to be supported.[R]
gclustandgenienow return the computed MST via themstobject attribute.geniereturns an object of the classmstclust. This makes it operable withdeadwood.[Python] [BUGFIX] Modifying
quitefastmst_paramsviaset_statenow invalidates the cached MST.[Python] [NEW FEATURE]
plots.plot_scatterhas new arguments:asp,markers, andcolours. The module globalsmrkandcolwere renamed accordingly. However, as mentioned above,plotswas moved todeadwood.[Python] [BACKWARD INCOMPATIBILITY]
compute_all_cutsinGeniewas renamedcoarser. IfTrue,labels_is still a vector representing the requestedn_clusters. The coarser-grained labels are now stored inlabels_matrix_whosei-th row represents an(i+1)-partition.
1.2.0 (2025-07-24)¶
[Python and R] Using the new implementation of Euclidean and mutual reachability minimum spanning trees (pretty fast in low dimensional spaces) from the
quitefastmstpackage.[BACKWARD INCOMPATIBILITY] [Python] Seeking approximate near-neighbours with
nmslibis no longer supported directly; unfortunately, the package has not been updated for a while.[BACKWARD INCOMPATIBILITY]
mlpackis not used anymore.[Python]
MSTClusterMixin: A base class for Genie, GIc, and other MST-based clustering algorithms. [later moved todeadwood][BACKWARD INCOMPATIBILITY] [Python]
GenieandGIc:affinitywas renamedmetric.
1.1.6 (2024-08-22)¶
[Python] The package now works with
numpy2.0.
1.1.5 (2023-10-18)¶
[BACKWARD INCOMPATIBILITY] Inequality measures are no longer referred to as inequity measures.
[BACKWARD INCOMPATIBILITY] Some external cluster validity measures were renamed:
adjusted_asymmetric_accuracy→normalized_clustering_accuracy,normalized_accuracy→normalized_pivoted_accuracy.[BACKWARD INCOMPATIBILITY] [Python]
compare_partitions2has been removed, ascompare_partitionsand other partition similarity scores now support both pairs of label vectors(x, y)and confusion matrices(x=C, y=None).[Python and R] New parameter to
pair_sets_index:clipped.In
normalizing_permutationand external cluster validity measures, the input matrices can now be of the typedouble.[BUGFIX] [Python] #80: Fixed adjustment for
nmslib_n_neighborsin small samples.[BUGFIX] [Python] #82:
cluster_validitysubmodule not imported.[BUGFIX] Some external cluster validity measures now handle NaNs better and are slightly less prone to round-off errors.
1.1.4 (2023-03-31)¶
[Python] The GIc algorithm is no longer marked as experimental; its description is provided in https://doi.org/10.1007/s00357-024-09483-1.
1.1.3 (2023-01-17)¶
[R]
mst.defaultnow throws an error if any element in the input matrix is missing/infinite.[Python] The call to
mlpack.emstthat stopped working with the new version ofmlpackhas been fixed.
1.1.2 (2022-09-17)¶
[Python and R]
adjusted_asymmetric_accuracynow accepts confusion matrices with fewer columns than rows. Such “missing” columns are now treated as if they were filled with 0s.[Python and R]
pair_sets_index, andnormalized_accuracyreturn the same results for non-symmetric confusion matrices and transposes thereof.
1.1.1 (2022-09-15)¶
[Python] #75:
nmslibis now optional.[BUILD TIME] The use of
ssize_twas not portable.
1.1.0 (2022-09-05)¶
[Python and R] New function:
adjusted_asymmetric_accuracy.[Python and R] Implementations of the so-called internal cluster validity measures discussed in DOI: 10.1016/j.ins.2021.10.004; see our (GitHub-only) CVI package for R. In particular, the generalised Dunn indices are based on the code originally authored by Maciej Bartoszuk. Thanks.
Functions added (
cluster_validitymodule):calinski_harabasz_index,dunnowa_index,generalised_dunn_index,negated_ball_hall_index,negated_davies_bouldin_index,negated_wcss_index,silhouette_index,silhouette_w_index,wcnn_index.These cluster validity measures are discussed in more detail at https://clustering-benchmarks.gagolewski.com/.
[BACKWARD INCOMPATIBILITY]
normalized_confusion_matrixnow solves the maximal assignment problem instead of applying the somewhat primitive partial pivoting.[Python and R] New function:
normalizing_permutation[R] New function:
normalized_confusion_matrix.[Python and R] New parameter to
pair_sets_index:simplified.[Python] New parameters to
plots.plot_scatter:axis,title,xlabel,ylabel,xlim,ylim.
1.0.1 (2022-08-08)¶
A paper on the genieclust package is now available: M. Gagolewski, genieclust: Fast and robust hierarchical clustering, SoftwareX 15, 100722, 2021, DOI: 10.1016/j.softx.2021.100722.
[Python]
plots.plot_scatternow uses a more accessible default palette (from R 4.0.0).[Python and R] New function:
devergottini_index.
1.0.0 (2021-04-22)¶
[R] Using
mlpackinstead ofRcppMLPACK(#72). This package is merely suggested, not dependent upon.
0.9.8 (2021-01-08)¶
[Python] Require Python >= 3.7 (implied by
numpy).[Python] Require
nmslib.[R] Use
RcppMLPACKdirectly; remove dependency onemstreeR.[R] Use
tinytestfor unit testing instead oftestthat.
0.9.4 (2020-07-31)¶
[BUGFIX] [R] Fixed build errors on Solaris.
0.9.3 (2020-07-25)¶
[BUGFIX] [Python] Added code coverage CI. Fixed some minor inconsistencies. Automated the
bdistbuild chain.[R] Updated DESCRIPTION to meet the CRAN policies.
0.9.2 (2020-07-22)¶
[BUGFIX] [Python] Fix broken build script for OS X with no OpenMP.
0.9.1 (2020-07-18)¶
The package has been completely rewritten. The core functionality is now implemented in C++ (with OpenMP).
[R] R version is now available.
[EXPERIMENTAL] A preliminary version of clustering with respect to DBSCAN*-like mutual reachability distances is now supported.
The parallelised Jarník-Prim algorithm now supports on-the-fly distance computations. Euclidean minimum spanning tree can be determined with
mlpack, which is much faster in low-dimensional spaces.[EXPERIMENTAL] [Python] The GIc algorithm proposed by Anna Cena in her 2018 PhD thesis wad added.
[Python] Approximate version based on nearest neighbour graphs produced by
nmslibwas added.
0.1a2 (2018-05-23)¶
[Python] Initial PyPI release.