genieclust.cluster_validity

So-called internal cluster validity indices

The greater the index value, the more valid (whatever that means) the assessed partition. For consistency, the Ball-Hall and Davies-Bouldin indexes take negative values.

These measures were critically reviewed in (Gagolewski, Bartoszuk, Cena, 2022; https://doi.org/10.1016/j.ins.2021.10.004; preprint). See Section 2 therein for the respective definitions.

For even more details, see the Framework for Benchmarking Clustering Algorithms.

genieclust.cluster_validity.calinski_harabasz_index(X, y)

Computes the value of the Caliński-Harabasz index [3].

See [1] and [2] for the definition and discussion.

Parameters
Xc_contiguous ndarray, shape (n, d)

n data points in a feature space of dimensionality d

yarray_like

A vector of “small” integers representing a partition of the n input points; y[i] is the cluster ID of the i-th point, where 0 <= y[i] < K and K is the number of clusters.

Returns
indexfloat

Computed index value. The greater the index value, the more valid (whatever that means) the assessed partition.

See also

genieclust.cluster_validity.calinski_harabasz_index

The Caliński-Harabasz index

genieclust.cluster_validity.dunnowa_index

Generalised Dunn indices based on near-neighbours and OWA operators (by Gagolewski)

genieclust.cluster_validity.generalised_dunn_index

Generalised Dunn indices (by Bezdek and Pal)

genieclust.cluster_validity.negated_ball_hall_index

The Ball-Hall index (negated)

genieclust.cluster_validity.negated_davies_bouldin_index

The Davies-Bouldin index (negated)

genieclust.cluster_validity.negated_wcss_index

Within-cluster sum of squares (used as the objective function in the k-means and Ward algorithm) (negated)

genieclust.cluster_validity.silhouette_index

The Silhouette index (average silhouette score)

genieclust.cluster_validity.silhouette_w_index

The Silhouette W index (mean of the cluster average silhouette widths)

genieclust.cluster_validity.wcnn_index

The within-cluster near-neighbours index

References

1

Gagolewski M., A Framework for Benchmarking Clustering Algorithms, https://clustering-benchmarks.gagolewski.com

2

Gagolewski M., Bartoszuk M., Cena A., Are cluster validity measures (in)valid?, Information Sciences 581, 620–636, 2021, https://doi.org/10.1016/j.ins.2021.10.004 (preprint).

3

Calinski T., Harabasz J., A dendrite method for cluster analysis, Communications in Statistics 3(1), 1974, 1–27, https://doi.org/10.1080/03610927408827101.

genieclust.cluster_validity.dunnowa_index(X, y, M=25, owa_numerator='SMin:5', owa_denominator='Const')

Computes the generalised Dunn indices based on near-neighbours and OWA operators [2].

See [1] and [2] for the definition and discussion.

Parameters
Xc_contiguous ndarray, shape (n, d)

n data points in a feature space of dimensionality d

yarray_like

A vector of “small” integers representing a partition of the n input points; y[i] is the cluster ID of the i-th point, where 0 <= y[i] < K and K is the number of clusters.

Mint

number of nearest neighbours

owa_numerator, owa_denominatorstr

specifies the OWA operators to use in the definition of the DuNN index; one of: "Mean", "Min", "Max", "Const", "SMin:D", "SMax:D", where code{D} is an integer defining the degree of smoothness

Returns
indexfloat

Computed index value. The greater the index value, the more valid (whatever that means) the assessed partition.

See also

genieclust.cluster_validity.calinski_harabasz_index

The Caliński-Harabasz index

genieclust.cluster_validity.dunnowa_index

Generalised Dunn indices based on near-neighbours and OWA operators (by Gagolewski)

genieclust.cluster_validity.generalised_dunn_index

Generalised Dunn indices (by Bezdek and Pal)

genieclust.cluster_validity.negated_ball_hall_index

The Ball-Hall index (negated)

genieclust.cluster_validity.negated_davies_bouldin_index

The Davies-Bouldin index (negated)

genieclust.cluster_validity.negated_wcss_index

Within-cluster sum of squares (used as the objective function in the k-means and Ward algorithm) (negated)

genieclust.cluster_validity.silhouette_index

The Silhouette index (average silhouette score)

genieclust.cluster_validity.silhouette_w_index

The Silhouette W index (mean of the cluster average silhouette widths)

genieclust.cluster_validity.wcnn_index

The within-cluster near-neighbours index

References

1

Gagolewski M., A Framework for Benchmarking Clustering Algorithms, https://clustering-benchmarks.gagolewski.com

2(1,2)

Gagolewski M., Bartoszuk M., Cena A., Are cluster validity measures (in)valid?, Information Sciences 581, 620–636, 2021, https://doi.org/10.1016/j.ins.2021.10.004 (preprint).

genieclust.cluster_validity.generalised_dunn_index(X, y, lowercase_d=1, uppercase_d=2)

Computes the generalised Dunn indices (by Bezdek and Pal) [3].

See [1] and [2] for the definition and discussion.

Parameters
Xc_contiguous ndarray, shape (n, d)

n data points in a feature space of dimensionality d

yarray_like

A vector of “small” integers representing a partition of the n input points; y[i] is the cluster ID of the i-th point, where 0 <= y[i] < K and K is the number of clusters.

Mint

number of nearest neighbours

lowercase_dint

an integer between 1 and 5, denoting \(d_1\), …, \(d_5\) in the definition of the generalised Dunn index (numerator: min, max, and mean pairwise intracluster distance, distance between cluster centroids, weighted point-centroid distance, respectively)

uppercase_dint

an integer between 1 and 3, denoting \(D_1\), …, \(D_3\) in the definition of the generalised Dunn index (denominator: max and min pairwise intracluster distance, average point-centroid distance, respectively)

Returns
indexfloat

Computed index value. The greater the index value, the more valid (whatever that means) the assessed partition.

See also

genieclust.cluster_validity.calinski_harabasz_index

The Caliński-Harabasz index

genieclust.cluster_validity.dunnowa_index

Generalised Dunn indices based on near-neighbours and OWA operators (by Gagolewski)

genieclust.cluster_validity.generalised_dunn_index

Generalised Dunn indices (by Bezdek and Pal)

genieclust.cluster_validity.negated_ball_hall_index

The Ball-Hall index (negated)

genieclust.cluster_validity.negated_davies_bouldin_index

The Davies-Bouldin index (negated)

genieclust.cluster_validity.negated_wcss_index

Within-cluster sum of squares (used as the objective function in the k-means and Ward algorithm) (negated)

genieclust.cluster_validity.silhouette_index

The Silhouette index (average silhouette score)

genieclust.cluster_validity.silhouette_w_index

The Silhouette W index (mean of the cluster average silhouette widths)

genieclust.cluster_validity.wcnn_index

The within-cluster near-neighbours index

References

1

Gagolewski M., A Framework for Benchmarking Clustering Algorithms, https://clustering-benchmarks.gagolewski.com

2

Gagolewski M., Bartoszuk M., Cena A., Are cluster validity measures (in)valid?, Information Sciences 581, 620–636, 2021, https://doi.org/10.1016/j.ins.2021.10.004 (preprint).

3

Bezdek J., Pal N., Some new indexes of cluster validity, IEEE Transactions on Systems, Man, and Cybernetics, Part B 28, 1998, 301-315, https://doi.org/10.1109/3477.678624/.

genieclust.cluster_validity.negated_ball_hall_index(X, y)

Computes the value of the negated Ball-Hall index [3].

See [1] and [2] for the definition and discussion.

Parameters
Xc_contiguous ndarray, shape (n, d)

n data points in a feature space of dimensionality d

yarray_like

A vector of “small” integers representing a partition of the n input points; y[i] is the cluster ID of the i-th point, where 0 <= y[i] < K and K is the number of clusters.

Returns
indexfloat

Computed index value. The greater the index value, the more valid (whatever that means) the assessed partition.

See also

genieclust.cluster_validity.calinski_harabasz_index

The Caliński-Harabasz index

genieclust.cluster_validity.dunnowa_index

Generalised Dunn indices based on near-neighbours and OWA operators (by Gagolewski)

genieclust.cluster_validity.generalised_dunn_index

Generalised Dunn indices (by Bezdek and Pal)

genieclust.cluster_validity.negated_ball_hall_index

The Ball-Hall index (negated)

genieclust.cluster_validity.negated_davies_bouldin_index

The Davies-Bouldin index (negated)

genieclust.cluster_validity.negated_wcss_index

Within-cluster sum of squares (used as the objective function in the k-means and Ward algorithm) (negated)

genieclust.cluster_validity.silhouette_index

The Silhouette index (average silhouette score)

genieclust.cluster_validity.silhouette_w_index

The Silhouette W index (mean of the cluster average silhouette widths)

genieclust.cluster_validity.wcnn_index

The within-cluster near-neighbours index

References

1

Gagolewski M., A Framework for Benchmarking Clustering Algorithms, https://clustering-benchmarks.gagolewski.com

2

Gagolewski M., Bartoszuk M., Cena A., Are cluster validity measures (in)valid?, Information Sciences 581, 620–636, 2021, https://doi.org/10.1016/j.ins.2021.10.004 (preprint).

3

Ball G.H., Hall D.J., ISODATA: A novel method of data analysis and pattern classification, Technical report No. AD699616, Stanford Research Institute, 1965.

genieclust.cluster_validity.negated_davies_bouldin_index(X, y)

Computes the value of the Davies-Bouldin index [3].

See [1] and [2] for the definition and discussion.

Parameters
Xc_contiguous ndarray, shape (n, d)

n data points in a feature space of dimensionality d

yarray_like

A vector of “small” integers representing a partition of the n input points; y[i] is the cluster ID of the i-th point, where 0 <= y[i] < K and K is the number of clusters.

Returns
indexfloat

Computed index value. The greater the index value, the more valid (whatever that means) the assessed partition.

See also

genieclust.cluster_validity.calinski_harabasz_index

The Caliński-Harabasz index

genieclust.cluster_validity.dunnowa_index

Generalised Dunn indices based on near-neighbours and OWA operators (by Gagolewski)

genieclust.cluster_validity.generalised_dunn_index

Generalised Dunn indices (by Bezdek and Pal)

genieclust.cluster_validity.negated_ball_hall_index

The Ball-Hall index (negated)

genieclust.cluster_validity.negated_davies_bouldin_index

The Davies-Bouldin index (negated)

genieclust.cluster_validity.negated_wcss_index

Within-cluster sum of squares (used as the objective function in the k-means and Ward algorithm) (negated)

genieclust.cluster_validity.silhouette_index

The Silhouette index (average silhouette score)

genieclust.cluster_validity.silhouette_w_index

The Silhouette W index (mean of the cluster average silhouette widths)

genieclust.cluster_validity.wcnn_index

The within-cluster near-neighbours index

References

1

Gagolewski M., A Framework for Benchmarking Clustering Algorithms, https://clustering-benchmarks.gagolewski.com

2

Gagolewski M., Bartoszuk M., Cena A., Are cluster validity measures (in)valid?, Information Sciences 581, 620–636, 2021, https://doi.org/10.1016/j.ins.2021.10.004 (preprint).

3

Davies D.L., Bouldin D.W., A Cluster Separation Measure, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1 (2), 1979, 224-227, https://doi.org/10.1109/TPAMI.1979.4766909.

genieclust.cluster_validity.negated_wcss_index(X, y)

Computes the value of the negated within-cluster sum of squares (used as the objective function in the k-means and Ward algorithm)

See [1] and [2] for the definition and discussion.

Parameters
Xc_contiguous ndarray, shape (n, d)

n data points in a feature space of dimensionality d

yarray_like

A vector of “small” integers representing a partition of the n input points; y[i] is the cluster ID of the i-th point, where 0 <= y[i] < K and K is the number of clusters.

Returns
indexfloat

Computed index value. The greater the index value, the more valid (whatever that means) the assessed partition.

See also

genieclust.cluster_validity.calinski_harabasz_index

The Caliński-Harabasz index

genieclust.cluster_validity.dunnowa_index

Generalised Dunn indices based on near-neighbours and OWA operators (by Gagolewski)

genieclust.cluster_validity.generalised_dunn_index

Generalised Dunn indices (by Bezdek and Pal)

genieclust.cluster_validity.negated_ball_hall_index

The Ball-Hall index (negated)

genieclust.cluster_validity.negated_davies_bouldin_index

The Davies-Bouldin index (negated)

genieclust.cluster_validity.negated_wcss_index

Within-cluster sum of squares (used as the objective function in the k-means and Ward algorithm) (negated)

genieclust.cluster_validity.silhouette_index

The Silhouette index (average silhouette score)

genieclust.cluster_validity.silhouette_w_index

The Silhouette W index (mean of the cluster average silhouette widths)

genieclust.cluster_validity.wcnn_index

The within-cluster near-neighbours index

References

1

Gagolewski M., A Framework for Benchmarking Clustering Algorithms, https://clustering-benchmarks.gagolewski.com

2

Gagolewski M., Bartoszuk M., Cena A., Are cluster validity measures (in)valid?, Information Sciences 581, 620–636, 2021, https://doi.org/10.1016/j.ins.2021.10.004 (preprint).

genieclust.cluster_validity.silhouette_index(X, y)

Computes the value of the The Silhouette index (average silhouette score) [3].

See [1] and [2] for the definition and discussion.

Parameters
Xc_contiguous ndarray, shape (n, d)

n data points in a feature space of dimensionality d

yarray_like

A vector of “small” integers representing a partition of the n input points; y[i] is the cluster ID of the i-th point, where 0 <= y[i] < K and K is the number of clusters.

Returns
indexfloat

Computed index value. The greater the index value, the more valid (whatever that means) the assessed partition.

See also

genieclust.cluster_validity.calinski_harabasz_index

The Caliński-Harabasz index

genieclust.cluster_validity.dunnowa_index

Generalised Dunn indices based on near-neighbours and OWA operators (by Gagolewski)

genieclust.cluster_validity.generalised_dunn_index

Generalised Dunn indices (by Bezdek and Pal)

genieclust.cluster_validity.negated_ball_hall_index

The Ball-Hall index (negated)

genieclust.cluster_validity.negated_davies_bouldin_index

The Davies-Bouldin index (negated)

genieclust.cluster_validity.negated_wcss_index

Within-cluster sum of squares (used as the objective function in the k-means and Ward algorithm) (negated)

genieclust.cluster_validity.silhouette_index

The Silhouette index (average silhouette score)

genieclust.cluster_validity.silhouette_w_index

The Silhouette W index (mean of the cluster average silhouette widths)

genieclust.cluster_validity.wcnn_index

The within-cluster near-neighbours index

References

1

Gagolewski M., A Framework for Benchmarking Clustering Algorithms, https://clustering-benchmarks.gagolewski.com

2

Gagolewski M., Bartoszuk M., Cena A., Are cluster validity measures (in)valid?, Information Sciences 581, 620–636, 2021, https://doi.org/10.1016/j.ins.2021.10.004 (preprint).

3

Rousseeuw P.J., Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis, Computational and Applied Mathematics 20, 1987, 53-65, https://doi.org/10.1016/0377-0427(87)90125-7.

genieclust.cluster_validity.silhouette_w_index(X, y)

Computes the value of the The Silhouette W index (mean of the cluster average silhouette widths) [3].

See [1] and [2] for the definition and discussion.

Parameters
Xc_contiguous ndarray, shape (n, d)

n data points in a feature space of dimensionality d

yarray_like

A vector of “small” integers representing a partition of the n input points; y[i] is the cluster ID of the i-th point, where 0 <= y[i] < K and K is the number of clusters.

Returns
indexfloat

Computed index value. The greater the index value, the more valid (whatever that means) the assessed partition.

See also

genieclust.cluster_validity.calinski_harabasz_index

The Caliński-Harabasz index

genieclust.cluster_validity.dunnowa_index

Generalised Dunn indices based on near-neighbours and OWA operators (by Gagolewski)

genieclust.cluster_validity.generalised_dunn_index

Generalised Dunn indices (by Bezdek and Pal)

genieclust.cluster_validity.negated_ball_hall_index

The Ball-Hall index (negated)

genieclust.cluster_validity.negated_davies_bouldin_index

The Davies-Bouldin index (negated)

genieclust.cluster_validity.negated_wcss_index

Within-cluster sum of squares (used as the objective function in the k-means and Ward algorithm) (negated)

genieclust.cluster_validity.silhouette_index

The Silhouette index (average silhouette score)

genieclust.cluster_validity.silhouette_w_index

The Silhouette W index (mean of the cluster average silhouette widths)

genieclust.cluster_validity.wcnn_index

The within-cluster near-neighbours index

References

1

Gagolewski M., A Framework for Benchmarking Clustering Algorithms, https://clustering-benchmarks.gagolewski.com

2

Gagolewski M., Bartoszuk M., Cena A., Are cluster validity measures (in)valid?, Information Sciences 581, 620–636, 2021, https://doi.org/10.1016/j.ins.2021.10.004 (preprint).

3

Rousseeuw P.J., Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis, Computational and Applied Mathematics 20, 1987, 53-65, https://doi.org/10.1016/0377-0427(87)90125-7.

genieclust.cluster_validity.wcnn_index(X, y, M=25)

Computes the within-cluster near-neighbours index [2].

See [1] and [2] for the definition and discussion.

Parameters
Xc_contiguous ndarray, shape (n, d)

n data points in a feature space of dimensionality d

yarray_like

A vector of “small” integers representing a partition of the n input points; y[i] is the cluster ID of the i-th point, where 0 <= y[i] < K and K is the number of clusters.

Mint

number of nearest neighbours

Returns
indexfloat

Computed index value. The greater the index value, the more valid (whatever that means) the assessed partition.

See also

genieclust.cluster_validity.calinski_harabasz_index

The Caliński-Harabasz index

genieclust.cluster_validity.dunnowa_index

Generalised Dunn indices based on near-neighbours and OWA operators (by Gagolewski)

genieclust.cluster_validity.generalised_dunn_index

Generalised Dunn indices (by Bezdek and Pal)

genieclust.cluster_validity.negated_ball_hall_index

The Ball-Hall index (negated)

genieclust.cluster_validity.negated_davies_bouldin_index

The Davies-Bouldin index (negated)

genieclust.cluster_validity.negated_wcss_index

Within-cluster sum of squares (used as the objective function in the k-means and Ward algorithm) (negated)

genieclust.cluster_validity.silhouette_index

The Silhouette index (average silhouette score)

genieclust.cluster_validity.silhouette_w_index

The Silhouette W index (mean of the cluster average silhouette widths)

genieclust.cluster_validity.wcnn_index

The within-cluster near-neighbours index

References

1

Gagolewski M., A Framework for Benchmarking Clustering Algorithms, https://clustering-benchmarks.gagolewski.com

2(1,2)

Gagolewski M., Bartoszuk M., Cena A., Are cluster validity measures (in)valid?, Information Sciences 581, 620–636, 2021, https://doi.org/10.1016/j.ins.2021.10.004 (preprint).