References
How to Cite
When using genieclust in research publications, please cite [11] and [16] as specified below. Thank you.
See Also
fastcluster: http://www.danifold.net/fastcluster.html
mlpack: https://www.mlpack.org/
nmslib: https://github.com/nmslib/nmslib/tree/master/python_bindings
scikit-learn: https://scikit-learn.org/stable/modules/clustering.html
Bibliography
Buitinck, L., et al. (2013). API design for machine learning software: Experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122.
Campello, R.J.G.B., Moulavi, D., Sander, J. (2013). Density-based clustering based on hierarchical density estimates. Lecture Notes in Computer Science, 7819:160–172. DOI: 10.1007/978-3-642-37456-2_14.
Cena, A. (2018). Adaptive hierarchical clustering algorithms based on data aggregation methods. PhD thesis, Systems Research Institute, Polish Academy of Sciences.
Curtin, R.R., Edel, M., Lozhnikov, M., Mentekidis, Y., Ghaisas, S., Zhang, S. (2018). Mlpack 3: A fast, flexible machine learning library. Journal of Open Source Software, 3(26):726. DOI: 10.21105/joss.00726.
Dasgupta, S., Ng, V. (2009). Single data, multiple clusterings. In: Proc. NIPS Workshop Clustering: Science or Art? Towards Principled Approaches. URL: https://clusteringtheory.org.
Dua, D., Graff, C. (2019). UCI machine learning repository. URL: http://archive.ics.uci.edu/ml.
Ester, M., Kriegel, H.P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. KDD'96, pp. 226–231.
Fowlkes, E.B., Mallows, C.L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383):553–569.
Fränti, P., Mariescu-Istodor, R., Zhong, C. (2016). XNN graph. Lecture Notes in Computer Science, 10029:207–217. DOI: 10.1007/978-3-319-49055-7_19.
Fränti, P., Sieranoja, S. (2018). K-means properties on six clustering benchmark datasets. Applied Intelligence, 48(12):4743–4759. DOI: 10.1007/s10489-018-1238-7.
Gagolewski, M. (2021). genieclust: Fast and robust hierarchical clustering. SoftwareX, 15:100722. DOI: 10.1016/j.softx.2021.100722.
Gagolewski, M. (2022). A framework for benchmarking clustering algorithms. SoftwareX, 20:101270. URL: https://clustering-benchmarks.gagolewski.com, DOI: 10.1016/j.softx.2022.101270.
Gagolewski, M. (2022). Adjusted asymmetric accuracy: A well-behaving external cluster validity measure. under review (preprint). URL: https://arxiv.org/pdf/2209.02935.pdf, DOI: 10.48550/arXiv.2209.02935.
Gagolewski, M. (2022). Minimalist Data Wrangling with Python. Zenodo, Melbourne. ISBN 978-0-6455719-1-2. URL: https://datawranglingpy.gagolewski.com/, DOI: 10.5281/zenodo.6451068.
Gagolewski, M. (2023). Deep R Programming. Zenodo, Melbourne. ISBN 978-0-6455719-2-9 (reserved). early draft. URL: https://deepr.gagolewski.com/, DOI: 10.5281/zenodo.7490464.
Gagolewski, M., Bartoszuk, M., Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23. DOI: 10.1016/j.ins.2016.05.003.
Gagolewski, M., Bartoszuk, M., Cena, A. (2021). Are cluster validity measures (in)valid? Information Sciences, 581:620–636. DOI: 10.1016/j.ins.2021.10.004.
Gagolewski, M., Cena, A., Bartoszuk, M., Brzozowski, L. (2023). Clustering with minimum spanning trees: How good can it be? under review (preprint). URL: https://arxiv.org/pdf/2303.05679.pdf, DOI: 10.48550/arXiv.2303.05679.
Gagolewski, M., et al. (2020). Benchmark suite for clustering algorithms – version 1. URL: https://github.com/gagolews/clustering-benchmarks, DOI: 10.5281/zenodo.3815066.
Graves, D., Pedrycz, W. (2010). Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study. Fuzzy Sets and Systems, 161:522–543. DOI: 10.1016/j.fss.2009.10.021.
Hubert, L., Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1):193–218. DOI: 10.1007/BF01908075.
Jain, A.K., Law, M.H.C. (2005). Data clustering: A user's dilemma. Lecture Notes in Computer Science, 3776:1–10.
Karypis, G., Han, E.H., Kumar, V. (1999). CHAMELEON: Hierarchical clustering using dynamic modeling. Computer, 32(8):68–75. DOI: 10.1109/2.781637.
Kobren, A., Monath, N., Krishnamurthy, A., McCallum, A. (2017). A hierarchical algorithm for extreme clustering. In: Proc. 23rd ACM SIGKDD'17, pp. 255–264. DOI: 10.1145/3097983.3098079.
Ling, R.F. (1973). A probability theory of cluster analysis. Journal of the American Statistical Association, 68(341):159–164.
March, W.B., Ram, P., Gray, A.G. (2010). Fast Euclidean minimum spanning tree: Algorithm, analysis, and applications. In: Proc. 16th ACM SIGKDD'10, pp. 603–612. DOI: 10.1145/1835804.1835882.
McInnes, L., Healy, J., Astels, S. (2017). hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11):205. DOI: 10.21105/joss.00205.
Müller, A.C., Nowozin, S., Lampert, C.H. (2012). Information theoretic clustering using minimum spanning trees. In: Proc. German Conference on Pattern Recognition. URL: https://github.com/amueller/information-theoretic-mst.
Müllner, D. (2013). fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software, 53(9):1–18. DOI: 10.18637/jss.v053.i09.
Naidan, B., Boytsov, L., Malkov, Y., Novak, D. (2019). Non-metric space library (NMSLIB) manual, version 2.0. URL: https://github.com/nmslib/nmslib/blob/master/manual/latex/manual.pdf.
Olson, C.F. (1995). Parallel algorithms for hierarchical clustering. Parallel Computing, 21:1313–1325. DOI: 10.1016/0167-8191(95)00017-I.
Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(85):2825–2830. URL: http://jmlr.org/papers/v12/pedregosa11a.html.
Rezaei, M., Fränti, P. (2016). Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8):2173–2186. DOI: 10.1109/TKDE.2016.2551240.
Ultsch, A. (2005). Clustering with SOM: U*C. In: Workshop on Self-Organizing Maps, pp. 75–82.
Vinh, N.X., Epps, J., Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(95):2837–2854. URL: http://jmlr.org/papers/v11/vinh10a.html.