我正在查看具有“clusGap”功能的'cluster'库,以提取Kmeans聚类的聚类数。
这是代码:
# Compute Gap statistic (http://web.stanford.edu/~hastie/Papers/gap.pdf)
computeGapStatistic() <- function(data) {
gap <<- clusGap(shift_len_avg_data, FUN = kmeans, K.max = 8, B = 3)
if (ENABLE_PLOTS) {
plot(gap, main = "Gap statistic for the Nursing shift data")
}
print(gap)
return(gap)
}
当打印出'gap'时,它给出了以下输出:
> print(gap)
Clustering Gap statistic ["clusGap"].
B=3 simulated reference sets, k = 1..8
--> Number of clusters (method 'firstSEmax', SE.factor=1): 2
logW E.logW gap SE.sim
[1,] 8.702334 9.238385 0.53605067 0.007945542
[2,] 7.940133 8.544323 0.60418996 0.003790244
[3,] 7.772673 8.139836 0.36716303 0.005755805
[4,] 7.325798 7.849233 0.52343473 0.002732731
[5,] 7.233667 7.629954 0.39628748 0.003496058
[6,] 7.020220 7.439709 0.41948820 0.006451708
[7,] 6.707678 7.285907 0.57822872 0.002810682
[8,] 7.166932 7.150724 -0.01620749 0.004274151
这就是情节的样子:
问题:
如何从'gap'变量中提取簇数? '差距'似乎是一个清单。从以上描述中,似乎找到了2个簇。
答案 0 :(得分:0)
我自己想出来了。这就是我使用的:with(gap,maxSE(Tab[,"gap"],Tab[,"SE.sim"]))