Question

我正在尝试从factoextra包中找出这两个函数，为什么它们的参数似乎很相似（ eg kmeans，gap_stat { {1}} k.max , B`）产生不同的结果。

and

第一种方法使用library(cluster) library(cluster.datasets) library(tidyverse) library(factoextra) # load data and scale it data("all.mammals.milk.1956") mammals <- all.mammals.milk.1956 %>% select(-name) mammals_scaled <- scale(mammals)和factoextra::clusGap()

factoextra::fviz_gap_stat()

第二种方法使用gap_stat <- clusGap(mammals_scaled, FUN = kmeans, K.max = 24, B = 50) fviz_gap_stat(gap_stat) + theme_minimal() + ggtitle("fviz_gap_stat: Gap Statistic")

factoextra::fviz_nbclust()

我认为它可能是fviz_nbclust(mammals_scaled, kmeans, method = "gap_stat", k.max = 24, nboot = 50) + theme_minimal() + ggtitle("fviz_nbClust_gap_stat: Gap Statistic")中的nstart选项，但是当我使用clusGap()来读取jimhester/lookup的源代码时，代码如下：我找不到问题所在：

fviz_nbclust()

Answer 1

区别就在fviz_nbclust函数的开头。在第6行中，设置了随机种子：
set.seed(123)

由于kmeans算法使用随机开始，因此在重复运行中结果可能会有所不同。例如，我将您的数据与两个不同的随机种子一起使用，以得出略有不同的结果。

set.seed(123)  
gap_stat <- cluster::clusGap(mammals_scaled, FUN = kmeans, K.max = 24, B = 50)   
fviz_gap_stat(gap_stat) + theme_minimal() + ggtitle("fviz_gap_stat: Gap Statistic")

seed 123 gap stat

set.seed(42)  
gap_stat <- cluster::clusGap(mammals_scaled, FUN = kmeans, K.max = 24, B = 50)
fviz_gap_stat(gap_stat) + theme_minimal() + ggtitle("fviz_gap_stat: Gap Statistic")

seed 42 gap stat

我不完全确定为什么种子123的结果不相同，但是我认为它与以下事实有关：在我的代码中，它在clusGap函数的上方执行，而在Fviz_nbclust中，在这两个命令之间进行求值。

factoextra :: fviz_gap_stat（）与factoextra :: fviz_nbclust（df，method =“ gap_stat”）

1 个答案: