Question

我有4个群集，我想用ggplot可视化。我试图用ggplot绘制它，但我不知道如何使它看起来像下图。我的结果只是呈现散点图，显示未按质心相似性分组的点。

top50combos_freq ：有两列[freq，freq1]



top50combos_freq.ckmeans ：将带有4个群集的kmeans结果作为参数。

plot(top50combos_freq[top50combos_freq.ckmeans1$cluster==1,],
     col = "red",
     xlim = c(min(top50combos_freq[,1]), max(top50combos_freq[,1])),
     ylim = c(min(top50combos_freq[,2]), max(top50combos_freq[,2]))
     )
points(top50combos_freq[top50combos_freq.ckmeans1$cluster==2,],
       col="blue")
points(top50combos_freq[top50combos_freq.ckmeans1$cluster==3,],
       col="seagreen")
points(top50combos_freq.ckmeans1$centers, pch=2, col="green")

任何使用ggplot制作此剧情的帮助都会受到赞赏。谢谢。

Answer 1

这样做的一种方法是创建2个数据框：

一个用于实际数据点，其中一个因子变量指定了群集，
另一个只有质心（行数与簇数相同）。

然后，您可能希望像往常一样绘制第一个数据框，但随后添加其他geom，您可以在其中指定新数据框。

使用iris数据的示例：

library(ggplot2)
# Data frame with actual data points
plotDf <- iris 
# Data frame with centroids, one entry per centroid
centroidsDf <- data.frame(
  Sepal.Length = tapply(iris$Sepal.Length, iris$Species, mean),
  Sepal.Width = tapply(iris$Sepal.Width, iris$Species, mean)
)

# First plot data, colouring by cluster (in this case Species variable)
ggplot(
  data = plotDf,
  aes(x = Sepal.Length, y = Sepal.Width, col = Species)
) +
  geom_point() +
  # Then add centroids 
  geom_point(
    data = centroidsDf,                     # separate data.frame
    aes(x = Sepal.Length, y = Sepal.Width),
    col = "green",                          # notice "col" and "shape" are
    shape = 2)                              # outside aes()

如何使用ggplot绘制kmeans集群

1 个答案: