Question

我正在对某个特定单元随时间重复的特征进行 kmeans 聚类，如下所示：

Unit   Rate_month1    Rate_month2.    Rate_month3     etc
1         0.0003         0.001           0.01
2         0              0               0.001
3         0.0001         0.0001          0.002

费率仍然很低，但我想看看我是否可以区分这些单位。我读到缩放功能很好。在这种情况下，我有相同的特征，所以相同的比例，但平均值和 sd 实际上随着时间的推移略有不同，如您所见：

colMeans(data_clusters[,2:37], na.rm=TRUE)
  Rate_month1    Rate_month2    Rate_month3    etc
  3.827039e-05   3.957701e-04   1.806816e-02

apply(data_clusters[,2:37], 2, sd, na.rm=TRUE)
  Rate_month1    Rate_month2    Rate_month3    etc
  0.0003606985   0.0019612296   0.0285996361

我正在按比例缩放它们：

data_clusters[,2:37] <- lapply(data_clusters[,2:37], function(x) c(scale(x)))

但是如果我按照以下方式运行分析：

clusters <- kmeans(data_clusters[,2:37], 2, nstart=25)
str(clusters)
# to visualise
fviz_cluster(clusters, data=data_clusters[,2:37])

我得到以下输出：

我的问题是：我需要扩展吗？如果是这样，为什么我看到这个输出不是以 0 为中心？我的意思是 Dim2 在 y 轴上从 -30 到 10，而 Dim1 在 x 轴上从 0 到 30。有什么想法吗？

随着时间的推移，1 个特征的 kmeans 聚类

0 个答案: