Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Müller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. API design for machine learning software: Experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 108–122. 2013.


Ricardo J. G. B. Campello, Davoud Moulavi, Arthur Zimek, and Jörg Sander. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data, 10(1):5:1–5:51, 2015. doi:10.1145/2733381.


Ryan R. Curtin, Marcus Edel, Mikhail Lozhnikov, Yannis Mentekidis, Sumedh Ghaisas, and Shangtong Zhang. Mlpack 3: A fast, flexible machine learning library. Journal of Open Source Software, 3(26):726, 2018. doi:10.21105/joss.00726.


Sajib Dasgupta and Vincent Ng. Single data, multiple clusterings. In Proc. NIPS Workshop Clustering: Science or Art? Towards Principled Approaches. 2009. URL:


D. Dua and C. Graff. UCI machine learning repository. 2019. URL:


Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. KDD’96, pages 226–231. 1996.


E. B. Fowlkes and C. L. Mallows. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383):553–569, 1983.


P. Fränti and S. Sieranoja. K-means properties on six clustering benchmark datasets. Applied Intelligence, 48(12):4743–4759, 2018.


Pasi Fränti, Radu Mariescu-Istodor, and Caiming Zhong. XNN graph. Lecture Notes in Computer Science, 10029:207–217, 2016. doi:10.1007/978-3-319-49055-7_19.


Marek Gagolewski, Maciej Bartoszuk, and Anna Cena. Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23, 2016. doi:10.1016/j.ins.2016.05.003.


Marek Gagolewski and others. Benchmark suite for clustering algorithms – version 1. 2020. URL:, doi:10.5281/zenodo.3815066.


Daniel Graves and Witold Pedrycz. Kernel-based fuzzy clustering: A comparative experimental study. Fuzzy Sets and Systems, 161:522–543, 2010.


Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2(1):193–218, 1985. doi:10.1007/BF01908075.


Anil K. Jain and Martin H. C. Law. Data clustering: A user’s dilemma. Lecture Notes in Computer Science, 3776:1–10, 2005.


George Karypis, Eui-Hong Han, and Vipin Kumar. CHAMELEON: Hierarchical clustering using dynamic modeling. Computer, 32(8):68–75, 1999. doi:10.1109/2.781637.


Ari Kobren, Nicholas Monath, Akshay Krishnamurthy, and Andrew McCallum. A hierarchical algorithm for extreme clustering. In Proc. 23rd ACM SIGKDD’17, 255–264. 2017. doi:10.1145/3097983.3098079.


Robert F. Ling. A probability theory of cluster analysis. Journal of the American Statistical Association, 68(341):159–164, 1973.


William B. March, Parikshit Ram, and Alexander G. Gray. Fast euclidean minimum spanning tree: Algorithm, analysis, and applications. In Proc. 16th ACM SIGKDD’10, 603–612. 2010. doi:10.1145/1835804.1835882.


Leland McInnes, John Healy, and Steve Astels. Hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11):205, 2017.


Andreas C. Müller, Sebastian Nowozin, and Christoph H. Lampert. Information theoretic clustering using minimum spanning trees. In Proc. German Conference on Pattern Recognition. 2012. URL:


Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software, 53(9):1–18, 2013.


B. Naidan, L. Boytsov, Y. Malkov, and D. Novak. Non-metric space library (NMSLIB) manual, version 2.0. 2019. URL:


Clark F. Olson. Parallel algorithms for hierarchical clustering. Parallel Computing, 21:1313–1325, 1995. doi:10.1016/0167-8191(95)00017-I.


F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.


Mohammad Rezaei and Pasi Fränti. Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8):2173–2186, 2016. doi:10.1109/TKDE.2016.2551240.


A. Ultsch. Clustering with SOM: U*C. In Workshop on Self-Organizing Maps, pages 75–82. 2005.


Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(95):2837–2854, 2010. URL: