Question

我不明白算法中 nstart 的变化。

如果center = 8，则表示该功能将群集8个组。但是， nstart 会变化什么？

这是对文档的解释：

centers:    
Either the number of clusters or a set of initial cluster centers. If the first, a random set of rows in x are chosen as the initial centers.

nstart:
If centers is a number, how many random sets should be chosen?

Answer 1

详细信息如下：

默认使用Hartigan和Wong（1979）的算法。请注意，一些作者使用k-means来指代特定算法而不是一般方法：最常见的是MacQueen（1967）给出的算法，但有时由Lloyd（1957）和Forgy（1965）给出。 Hartigan-Wong算法通常比其中任何一种做得更好，但通常建议尝试几次随机启动（nstart> 1）。在极少数情况下，当某些点（x行）时非常接近，算法可能不会收敛在“快速转移”阶段，发出警告信号（并返回ifault = 4）。在这种情况下，建议稍微舍入数据。

nstart表示随机启动的次数。我无法解释统计细节，但在他们的示例代码中，此函数的作者选择25个随机开始：

## random starts do help here with too many clusters
## (and are often recommended anyway!):
(cl <- kmeans(x, 5, nstart = 25))

Answer 2

不幸的是，?kmeans并没有完全解释（在stats和amap包中）。但是，可以通过查看kmeans代码来获得一个主意。

如果对于nstart使用了多个随机开始（kmeans大于1），则该算法将返回分区，该分区对应于最小的总簇内平方和

。

（输出包含的群集内平方和的总和值为tot.withinss）。

Kmeans功能 - Amap包 - nstart代表什么

2 个答案: