Question

我想在R中运行buckshot算法，将hac（层次聚类）与k-means聚类结合起来。所以，我想选择很多k-means的中心。例如，一个集群有三个种子。这是我的代码，

虹膜数据k-means

iristr <- read.csv("iristr.CSV", header = TRUE)
str(iristr)
iristr.m <- as.matrix(iristr[,1:4])
km <- kmeans(iristr.m, centers = 3)
km
table(km$cluster,iristr$Species)

虹膜数据buckshot

irists <- read.csv("irists.csv", header = TRUE)
str(irists)
irists.m <- as.matrix(irists[,1:4])
dm <- dist(irists.m, method = "euclidean")
hc <- hclust(dm, method = "complete")
plot(hc)
clusterCut <- cutree(hc,3)
clusterCut
i1 <- iristr.m[c(1,4,12),] # one of cluster have many seed(center)
i1 
i2 <- iristr.m[c(2,5,8),] # one of cluster have many seed(center)
i2
i3 <- iristr.m[c(3,6,7,9,10,11),] # one of cluster have many seed(center)
i3
buckshot <- kmeans(iristr.m, centers=i1,i2,i3) # realized only "i1" centers
buckshot
table(buckshot$cluster,iristr$Species)

Answer 1

以下是在Iris数据上应用Kmeans聚类算法的示例。

使用Iris数据，将功能列1-4分配给变量x，将类分配给变量y。

x = iris[,-5]
y = iris$Species

在Kmeans算法中，初始群集分配是随机的。由于我们知道该数据中有3种，因此群集的总数可以指定为3.此外，由于Kmeans中的起始分配是随机的，因此nstart可以分配10，这意味着10个不同（随机）初始中心分配将尝试并且具有最低的簇内平方和（WCSS）（簇中每个点到K中心的距离函数的总和）将被选为最终的。 您可以为参数＆＃34; nstart＆＃34;分配更高的值。告诉Kmeans算法尝试更多可能的随机初始中心分配。

kc <- kmeans(x, centers = 3, nstart = 10)

要知道错误，然后将聚类结果与虹膜数据中的种类/类进行比较。

table(y,kc$cluster)

最后，通过将萼片长度绘制为x轴和萼片宽度作为y轴（可以选择不同）来显示结果。

plot(x[c("Sepal.Length", "Sepal.Width")], col = kc$cluster)
points(kc$centers[,c("Sepal.Length", "Sepal.Width")], col = 1:3, pch=23, cex=3)

Answer 2

这是为了将 kmeans 与动画库一起使用。我不确定它是否会帮助您，但如果其他人使用动画库搜索此主题，它确实提供了解决方案。

#data setup
M = matrix(c(2, 2, 4, 1, 8, 3, 7, 2, 5, 9, 4, 2, 3, 1, 2, 6), ncol = 2, byrow = TRUE)
rownames(M) = c('A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8')
colnames(M) = c('x', 'y')
#Matrix for initial centers to be A2, A5, A8
A = matrix(c(4, 1, 5, 9, 8, 7), ncol = 2, byrow = TRUE)
colnames(A) = c('x', 'y')


library(animation)
oopt = ani.options(interval = 5)
#pass A for centers argument
kmeans.ani(M, centers = A)
help(kmeans.ani)

First Plot of kmeans.ani showing centers

如何在R中选择许多k-means聚类的初始中心

2 个答案: