References¶
How to Cite¶
When using genieclust in research publications, please cite [11] and [16] as specified below. Thank you.
See Also¶
fastcluster: http://www.danifold.net/fastcluster.html
mlpack: https://www.mlpack.org/
nmslib: https://github.com/nmslib/nmslib/tree/master/python_bindings
scikit-learn: https://scikit-learn.org/stable/modules/clustering.html
Bibliography¶
Buitinck, L. and others. (2013). API design for machine learning software: Experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122.
Campello, R.J.G.B., Moulavi, D., and Sander, J. (2013). Density-based clustering based on hierarchical density estimates. Lecture Notes in Computer Science, 7819:160–172. DOI: 10.1007/978-3-642-37456-2_14.
Cena, A. (2018). Adaptive hierarchical clustering algorithms based on data aggregation methods. PhD thesis, Systems Research Institute, Polish Academy of Sciences.
Curtin, R.R., Edel, M., Lozhnikov, M., Mentekidis, Y., Ghaisas, S., and Zhang, S. (2018). Mlpack 3: A fast, flexible machine learning library. Journal of Open Source Software, 3(26):726. DOI: 10.21105/joss.00726.
Dasgupta, S. and Ng, V. (2009). Single data, multiple clusterings. In: Proc. NIPS Workshop Clustering: Science or Art? Towards Principled Approaches. URL: https://clusteringtheory.org.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository. URL: http://archive.ics.uci.edu/ml.
Ester, M., Kriegel, H.P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. KDD'96, pp. 226–231.
Fowlkes, E.B. and Mallows, C.L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383):553–569.
Fränti, P., Mariescu-Istodor, R., and Zhong, C. (2016). XNN graph. Lecture Notes in Computer Science, 10029:207–217. DOI: 10.1007/978-3-319-49055-7_19.
Fränti, P. and Sieranoja, S. (2018). K-means properties on six clustering benchmark datasets. Applied Intelligence, 48(12):4743–4759. DOI: 10.1007/s10489-018-1238-7.
Gagolewski, M. (2021). genieclust: Fast and robust hierarchical clustering. SoftwareX, 15:100722. URL: https://genieclust.gagolewski.com/, DOI: 10.1016/j.softx.2021.100722.
Gagolewski, M. (2022). A framework for benchmarking clustering algorithms. SoftwareX, 20:101270. URL: https://clustering-benchmarks.gagolewski.com/, DOI: 10.1016/j.softx.2022.101270.
Gagolewski, M. (2024). Deep R Programming. Zenodo, Melbourne. ISBN 978-0-6455719-2-9. URL: https://deepr.gagolewski.com/, DOI: 10.5281/zenodo.7490464.
Gagolewski, M. (2024). Minimalist Data Wrangling with Python. Zenodo, Melbourne. ISBN 978-0-6455719-1-2. URL: https://datawranglingpy.gagolewski.com/, DOI: 10.5281/zenodo.6451068.
Gagolewski, M. (2024). Normalised clustering accuracy: An asymmetric external cluster validity measure. Journal of Classification. in press. URL: https://link.springer.com/content/pdf/10.1007/s00357-024-09482-2.pdf, DOI: 10.1007/s00357-024-09482-2.
Gagolewski, M., Bartoszuk, M., and Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23. URL: https://arxiv.org/pdf/2209.05757, DOI: 10.1016/j.ins.2016.05.003.
Gagolewski, M., Bartoszuk, M., and Cena, A. (2021). Are cluster validity measures (in)valid? Information Sciences, 581:620–636. URL: https://arxiv.org/pdf/2208.01261, DOI: 10.1016/j.ins.2021.10.004.
Gagolewski, M., Cena, A., Bartoszuk, M., and Brzozowski, L. (2024). Clustering with minimum spanning trees: How good can it be? Journal of Classification. in press. URL: https://link.springer.com/content/pdf/10.1007/s00357-024-09483-1.pdf, DOI: 10.1007/s00357-024-09483-1.
Graves, D. and Pedrycz, W. (2010). Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study. Fuzzy Sets and Systems, 161:522–543. DOI: 10.1016/j.fss.2009.10.021.
Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1):193–218. DOI: 10.1007/BF01908075.
Jain, A.K. and Law, M.H.C. (2005). Data clustering: A user's dilemma. Lecture Notes in Computer Science, 3776:1–10.
Karypis, G., Han, E.H., and Kumar, V. (1999). CHAMELEON: Hierarchical clustering using dynamic modeling. Computer, 32(8):68–75. DOI: 10.1109/2.781637.
Kobren, A., Monath, N., Krishnamurthy, A., and McCallum, A. (2017). A hierarchical algorithm for extreme clustering. In: Proc. 23rd ACM SIGKDD'17, pp. 255–264. DOI: 10.1145/3097983.3098079.
Ling, R.F. (1973). A probability theory of cluster analysis. Journal of the American Statistical Association, 68(341):159–164.
March, W.B., Ram, P., and Gray, A.G. (2010). Fast Euclidean minimum spanning tree: Algorithm, analysis, and applications. In: Proc. 16th ACM SIGKDD'10, pp. 603–612. DOI: 10.1145/1835804.1835882.
McInnes, L., Healy, J., and Astels, S. (2017). hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11):205. DOI: 10.21105/joss.00205.
Müller, A.C., Nowozin, S., and Lampert, C.H. (2012). Information theoretic clustering using minimum spanning trees. In: Proc. German Conference on Pattern Recognition. URL: https://github.com/amueller/information-theoretic-mst.
Müllner, D. (2013). fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software, 53(9):1–18. DOI: 10.18637/jss.v053.i09.
Naidan, B., Boytsov, L., Malkov, Y., and Novak, D. (2019). Non-metric space library (NMSLIB) manual, version 2.0. URL: https://github.com/nmslib/nmslib/blob/master/manual/latex/manual.pdf.
Olson, C.F. (1995). Parallel algorithms for hierarchical clustering. Parallel Computing, 21:1313–1325. DOI: 10.1016/0167-8191(95)00017-I.
Pedregosa, F. and others. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(85):2825–2830. URL: http://jmlr.org/papers/v12/pedregosa11a.html.
Rezaei, M. and Fränti, P. (2016). Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8):2173–2186. DOI: 10.1109/TKDE.2016.2551240.
Ultsch, A. (2005). Clustering with SOM: U*C. In: Workshop on Self-Organizing Maps, pp. 75–82.
Vinh, N.X., Epps, J., and Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(95):2837–2854. URL: http://jmlr.org/papers/v11/vinh10a.html.