Changelog¶
To Do¶
Check for NA/NaN/Inf in the input matrices.
1.2.0 (2025-07-24)¶
[Python and R] Using the new implementation of Euclidean and mutual reachability minimum spanning trees (quite fast in low dimensional spaces) from the
quitefastmstpackage.[Python and R] [BACKWARD INCOMPATIBILITY]
mlpackis not used anymore.[Python] [BACKWARD INCOMPATIBILITY] Seeking approximate near-neighbours with
nmslibis no longer supported directly (the package has not been updated for a while).[Python]
MSTClusterMixin(BaseEstimator, ClusterMixin): A base class for Genie, GIc, and other MST-based clustering algorithms.[Python] [BACKWARD INCOMPATIBILITY]
GenieandGIc:affinitywas renamedmatrix.
1.1.6 (2024-08-22)¶
[Python] The package now works with
numpy2.0.
1.1.5 (2023-10-18)¶
[BACKWARD INCOMPATIBILITY] [Python and R] Inequality measures are no longer referred to as inequity measures.
[BACKWARD INCOMPATIBILITY] [Python and R] Some external cluster validity measures were renamed:
adjusted_asymmetric_accuracy->normalized_clustering_accuracy,normalized_accuracy->normalized_pivoted_accuracy.[BACKWARD INCOMPATIBILITY] [Python]
compare_partitions2has been removed, ascompare_partitionsand other partition similarity scores now support both pairs of label vectors(x, y)and confusion matrices(x=C, y=None).[Python and R] New parameter to
pair_sets_index:clipped.In
normalizing_permutationand external cluster validity measures, the input matrices can now be of the typedouble.[BUGFIX] [Python] #80: Fixed adjustment for
nmslib_n_neighborsin small samples.[BUGFIX] [Python] #82:
cluster_validitysubmodule not imported.[BUGFIX] Some external cluster validity measures now handle NaNs better and are slightly less prone to round-off errors.
1.1.4 (2023-03-31)¶
[Python] The GIc algorithm is no longer marked as experimental; its description is provided in https://doi.org/10.1007/s00357-024-09483-1.
1.1.3 (2023-01-17)¶
[R]
mst.defaultnow throws an error if any element in the input matrix is missing/infinite.[Python] The call to
mlpack.emstthat stopped working with the new version ofmlpackhas been fixed.
1.1.2 (2022-09-17)¶
[Python and R]
adjusted_asymmetric_accuracynow accepts confusion matrices with fewer columns than rows. Such “missing” columns are now treated as if they were filled with 0s.[Python and R]
pair_sets_index, andnormalized_accuracyreturn the same results for non-symmetric confusion matrices and transposes thereof.
1.1.1 (2022-09-15)¶
[Python] #75:
nmslibis now optional.[BUILD TIME]: The use of
ssize_twas not portable.
1.1.0 (2022-09-05)¶
[Python and R] New function:
adjusted_asymmetric_accuracy.[Python and R] Implementations of the so-called internal cluster validity measures discussed in DOI: 10.1016/j.ins.2021.10.004; see our (GitHub-only) CVI package for R. In particular, the generalised Dunn indices are based on the code originally authored by Maciej Bartoszuk. Thanks.
Functions added (
cluster_validitymodule):calinski_harabasz_index,dunnowa_index,generalised_dunn_index,negated_ball_hall_index,negated_davies_bouldin_index,negated_wcss_index,silhouette_index,silhouette_w_index,wcnn_index.These cluster validity measures are discussed in more detail at https://clustering-benchmarks.gagolewski.com/.
[BACKWARD INCOMPATIBILITY]
normalized_confusion_matrixnow solves the maximal assignment problem instead of applying the somewhat primitive partial pivoting.[Python and R] New function:
normalizing_permutation[R] New function:
normalized_confusion_matrix.[Python and R] New parameter to
pair_sets_index:simplified.[Python] New parameters to
plots.plot_scatter:axis,title,xlabel,ylabel,xlim,ylim.
1.0.1 (2022-08-08)¶
[GENERAL] A paper on the genieclust package is now available: M. Gagolewski, genieclust: Fast and robust hierarchical clustering, SoftwareX 15, 100722, 2021, DOI: 10.1016/j.softx.2021.100722.
[Python]
plots.plot_scatternow uses a more accessible default palette (from R 4.0.0).[Python and R] New function:
devergottini_index.
1.0.0 (2021-04-22)¶
[R] Using
mlpackinstead ofRcppMLPACK(#72). This package is merely suggested, not dependent upon.
0.9.8 (2021-01-08)¶
[Python] Require Python >= 3.7 (implied by
numpy).[Python] Require
nmslib.[R] Use
RcppMLPACKdirectly; remove dependency onemstreeR.[R] Use
tinytestfor unit testing instead oftestthat.
0.9.4 (2020-07-31)¶
[BUGFIX] [R] Fixed build errors on Solaris.
0.9.3 (2020-07-25)¶
[BUGFIX] [Python] Added code coverage CI. Fixed some minor inconsistencies. Automated the
bdistbuild chain.[R] Updated DESCRIPTION to meet the CRAN policies.
0.9.2 (2020-07-22)¶
[BUGFIX] [Python] Fix broken build script for OS X with no OpenMP.
0.9.1 (2020-07-18)¶
[GENERAL] The package has been completely rewritten. The core functionality is now implemented in C++ (with OpenMP).
[GENERAL] Clustering with respect to HDBSCAN*-like mutual reachability distances is now supported.
[GENERAL] The parallelised Jarník-Prim algorithm now supports on-the-fly distance computations. Euclidean minimum spanning tree can be determined with
mlpack, which is much faster in low-dimensional spaces.[R] R version is now available.
[Python] [Experimental] The GIc algorithm proposed by Anna Cena in her 2018 PhD thesis wad added.
[Python] Approximate version based on nearest neighbour graphs produced by
nmslibwas added.
0.1a2 (2018-05-23)¶
[Python] Initial PyPI release.