References

How to Cite

When using genieclust in research publications, please cite [Gag21] and [GBC16] as specified below. Thank you.

Bibliography

B+13

L. Buitinck and others. API design for machine learning software: Experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 108–122. 2013.

CMS13

R.J.G.B. Campello, D. Moulavi, and J. Sander. Density-based clustering based on hierarchical density estimates. Lecture Notes in Computer Science, 7819:160–172, 2013. doi:10.1007/978-3-642-37456-2_14.

Cen18

A. Cena. Adaptive hierarchical clustering algorithms based on data aggregation methods. PhD thesis, Systems Research Institute, Polish Academy of Sciences, 2018.

CEL+18

R.R. Curtin, M. Edel, M. Lozhnikov, Y. Mentekidis, S. Ghaisas, and S. Zhang. Mlpack 3: A fast, flexible machine learning library. Journal of Open Source Software, 3(26):726, 2018. doi:10.21105/joss.00726.

DN09

S. Dasgupta and V. Ng. Single data, multiple clusterings. In Proc. NIPS Workshop Clustering: Science or Art? Towards Principled Approaches. 2009. URL: https://clusteringtheory.org.

DG19

D. Dua and C. Graff. UCI machine learning repository. 2019. URL: http://archive.ics.uci.edu/ml.

EKSX96

M. Ester, H.P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. KDD'96, pages 226–231. 1996.

FM83

E.B. Fowlkes and C.L. Mallows. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383):553–569, 1983.

FMIZ16

P. Fränti, R. Mariescu-Istodor, and C. Zhong. XNN graph. Lecture Notes in Computer Science, 10029:207–217, 2016. doi:10.1007/978-3-319-49055-7_19.

FS18

P. Fränti and S. Sieranoja. K-means properties on six clustering benchmark datasets. Applied Intelligence, 48(12):4743–4759, 2018. doi:10.1007/s10489-018-1238-7.

Gag21

M. Gagolewski. genieclust: Fast and robust hierarchical clustering. SoftwareX, 15:100722, 2021. doi:10.1016/j.softx.2021.100722.

Gag22a

M. Gagolewski. A framework for benchmarking clustering algorithms. 2022. under review (preprint). URL: https://clustering-benchmarks.gagolewski.com, doi:10.48550/arXiv.2209.09493.

Gag22b

M. Gagolewski. Adjusted asymmetric accuracy: A well-behaving external cluster validity measure. 2022. under review (preprint). URL: https://arxiv.org/pdf/2209.02935.pdf, doi:10.48550/arXiv.2209.02935.

Gag22c

M. Gagolewski. Minimalist Data Wrangling with Python. Zenodo, Melbourne, 2022. ISBN 978-0-6455719-1-2. URL: https://datawranglingpy.gagolewski.com/, doi:10.5281/zenodo.6451068.

GBC16

M. Gagolewski, M. Bartoszuk, and A. Cena. Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363:8–23, 2016. doi:10.1016/j.ins.2016.05.003.

GBC21

M. Gagolewski, M. Bartoszuk, and A. Cena. Are cluster validity measures (in)valid? Information Sciences, 581:620–636, 2021. doi:10.1016/j.ins.2021.10.004.

G+20

M. Gagolewski and others. Benchmark suite for clustering algorithms – version 1. 2020. URL: https://github.com/gagolews/clustering-benchmarks, doi:10.5281/zenodo.3815066.

GP10

D. Graves and W. Pedrycz. Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study. Fuzzy Sets and Systems, 161:522–543, 2010. doi:10.1016/j.fss.2009.10.021.

HA85

L. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2(1):193–218, 1985. doi:10.1007/BF01908075.

JL05

A.K. Jain and M.H.C. Law. Data clustering: A user's dilemma. Lecture Notes in Computer Science, 3776:1–10, 2005.

KHK99

G. Karypis, E.H. Han, and V. Kumar. CHAMELEON: Hierarchical clustering using dynamic modeling. Computer, 32(8):68–75, 1999. doi:10.1109/2.781637.

KMKM17

A. Kobren, N. Monath, A. Krishnamurthy, and A. McCallum. A hierarchical algorithm for extreme clustering. In Proc. 23rd ACM SIGKDD'17, 255–264. 2017. doi:10.1145/3097983.3098079.

Lin73

R.F. Ling. A probability theory of cluster analysis. Journal of the American Statistical Association, 68(341):159–164, 1973.

MRG10

W.B. March, P. Ram, and A.G. Gray. Fast Euclidean minimum spanning tree: Algorithm, analysis, and applications. In Proc. 16th ACM SIGKDD'10, 603–612. 2010. doi:10.1145/1835804.1835882.

MHA17

L. McInnes, J. Healy, and S. Astels. hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11):205, 2017. doi:10.21105/joss.00205.

MNL12

A.C. Müller, S. Nowozin, and C.H. Lampert. Information theoretic clustering using minimum spanning trees. In Proc. German Conference on Pattern Recognition. 2012. URL: https://github.com/amueller/information-theoretic-mst.

Mul13

D. Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software, 53(9):1–18, 2013. doi:10.18637/jss.v053.i09.

NBMN19

B. Naidan, L. Boytsov, Y. Malkov, and D. Novak. Non-metric space library (NMSLIB) manual, version 2.0. 2019. URL: https://github.com/nmslib/nmslib/blob/master/manual/latex/manual.pdf.

Ols95

C.F. Olson. Parallel algorithms for hierarchical clustering. Parallel Computing, 21:1313–1325, 1995. doi:10.1016/0167-8191(95)00017-I.

P+11

F. Pedregosa and others. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(85):2825–2830, 2011. URL: http://jmlr.org/papers/v12/pedregosa11a.html.

RF16

M. Rezaei and P. Fränti. Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8):2173–2186, 2016. doi:10.1109/TKDE.2016.2551240.

Ult05

A. Ultsch. Clustering with SOM: U*C. In Workshop on Self-Organizing Maps, pages 75–82. 2005.

VEB10

N.X. Vinh, J. Epps, and J. Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(95):2837–2854, 2010. URL: http://jmlr.org/papers/v11/vinh10a.html.

See Also

fastcluster

http://www.danifold.net/fastcluster.html

hdbscan

https://github.com/scikit-learn-contrib/hdbscan

ITM

https://github.com/amueller/information-theoretic-mst

mlpack

https://www.mlpack.org/

nmslib

https://github.com/nmslib/nmslib/tree/master/python_bindings

scikit-learn

https://scikit-learn.org/stable/modules/clustering.html