Question

我想将kmeans函数应用于数据集。

我运行了几次。我每次都会增加中心数量。对于每次运行，我将平方和内的总数存储在一个向量中，然后将平方和内的总数与簇数相对应，如下所示：

# Dummy data
cluster1_x <- rnorm(1000, mean = 3.5, sd = .75)
cluster1_y <- rnorm(1000, mean = 4, sd = 1.13)
cluster1 <- cbind(cluster1_x, cluster1_y)

cluster2_x <- rnorm(1000, mean = 5.2, sd = .75)
cluster2_y <- rnorm(1000, mean = .9, sd = .64)
cluster2 <- cbind(cluster2_x, cluster2_y)

cluster3_x <- rnorm(1000, mean = .68, sd = .86)
cluster3_y <- rnorm(1000, mean = 0.8, sd = 1)
cluster3 <- cbind(cluster3_x, cluster3_y)

df <- rbind(cluster1, cluster2, cluster3)

# To see the dummy clusters
# plot(df, pch = 20) 

# Applying kmeans

# Vector that will be filled with the variance in the clusters
tot.within.sum.square <- rep(NA, 20)

for (nb_center in 1:20){
  tps_start <- Sys.time()
  set.seed(13)
  res.kmeans <- kmeans(df, centers=nb_center, iter.max = 30)
  tot.within.sum.square[nb_center] <- res.kmeans$tot.withinss
  tps_exec <- Sys.time() - tps_start
  print(paste0("Iteration ", nb_center, " : ", tps_exec))
}

plot(1:20, tot.within.sum.square, type = 'b', pch=20)

我想重复此过程4次，每次使用不同的算法。有4个不同的值“ Hartigan-Wong”，“ Lloyd”，“ Forgy”，“ MacQueen”，因此我想得出长度为20的4个不同向量，每个算法一个向量。给定向量的每个元素都是res.kmeans$tot.withinss中包含的值。例如，向量的第4个元素是与4个中心的kmeans的平方的平方和之内的总和相对应的值。我可以复制并粘贴以前的代码，但是我正在寻找一种更优雅的方式来获得结果。

我可以用这个得到我想要的东西：

sapply(algos, function(x) {
  sapply(nb_centers, function(y) kmeans(df, centers = y, algorithm = x))
})

但是我无法将每个算法的每次迭代中的每个total.withins都存储在变量中。

任何帮助将不胜感激！

Answer 1

如@Parfait的评论所述，

sizeof(a) > 10 ? 10 : sizeof(a)

会成功的！

R-如何为指定参数应用具有所有可能值的函数？

1 个答案: