k = 2的Kmeans算法给出相等的簇大小输出

时间:2017-05-15 04:31:48

标签: algorithm machine-learning cluster-analysis k-means spherical-kmeans

我使用修改后的Lloyd算法在k = 2的kmeans中获得相等的簇大小输出。 以下是伪代码:

- Randomly choose 2 points as initialization for the 2 clusters (denoted as c1, c2)
- Repeat below steps until convergence
    - Sort all points xi according to ascending values of ||xi-c1|| - ||xi-c2||, i.e. differences in distances to the first and the second cluster
    - Put top 50% points in cluster 1 , others in cluster 2
    - Recalculate centroids as average of the allocated points (as usual in Lloyd's)

现在上面的算法对我来说很有经验:

  1. 它提供平衡的群集
  2. 它总是会降低目标
  3. 之前在文献中提出或分析过这样的算法吗?我能得到一些参考资料吗?

1 个答案:

答案 0 :(得分:2)

此处解释了超过2个群集的更通用版本:

https://elki-project.github.io/tutorial/same-size_k_means

我在文献中已经看过几次具有各种尺寸限制的k-means,但我手头没有任何参考资料。我不相信这一点:强迫群集具有相同的大小与找到最小二乘最佳逼近IMHO的k均值思想相矛盾,因为它意味着故意选择更差的近似值。