kmeans中群集区域的公式

时间:2019-09-19 12:56:37

标签: r machine-learning k-means

使用K均值聚类生成K个聚类时,我们如何计算每个聚类的面积?有什么公式吗?

我已经在rgeos包中尝试过gArea(),但是得到的错误代码为“

  

投影的签名kmeans找不到功能的继承方法

聚类分析工作完美,我只需要找到每个聚类区域的方法。因此,无论是使用虚弱性,折腾和相互之间的公式,还是对代码有帮助的人,

聚类分析工作完美,我只需要找到每个聚类区域的方法。到目前为止,我的绘图部分是:

###################### Clustering Script
clusters <- kmeans(df[2:3], k)

# Save the cluster number in the dataset as column 'Borough'
df$clusterId <- as.factor(clusters$cluster)
m_color=c("#999999","#E69F00","#56B4E9", "#009E73", "#F0E442", 
"#0072B2", "#D55E00", "#CC79A7","#A09999","#B99F00","#E6E4E9", 
"#777E73", "#D1A142", "#33AAB2", "#99CC00")

fviz_cluster(clusters, data = df[2:3], 
             ellipse.type = "norm",
             ellipse.level = 0.99,
             palette = m_color,
             geom = "point",
             axes = c(0,0), 
             show.clust.cent = TRUE,
             ggtheme = theme_minimal()
             )

clusters$totss
clusters$size
clusters$centers
clusters$withinss
clusters$betweenss
gArea(clusters, byid = FALSE)

1 个答案:

答案 0 :(得分:3)

使用example(kmeans)中的示例,我们可以获取点的凸包,然后使用polyarea计算面积。

library(geometry)

set.seed(123)
example(kmeans)  # creates input x and kmeans output cl

# area of convex hull of points in the cluster
area <- function(z) { xy <- z[chull(z), ]; polyarea(xy[,1], xy[,2]) }
sapply(split(as.data.frame(x), cl$cluster), area)
##         1         2         3         4         5 
## 0.3758644 0.4127252 0.2722848 0.2090896 0.3283888 

# area  of box bounding all points in the cluster
area.box <- function(z) diff(range(z[, 1])) * diff(range(z[, 2]))
sapply(split(as.data.frame(x), cl$cluster), area.box)
##         1         2         3         4         5 
## 0.6570733 0.7924508 0.4263473 0.3307718 0.5639517 

# area of largest ellipse in the bounding box
area.ellipse <- function(z) pi * diff(range(z[, 1])) * diff(range(z[, 2])) / 4
sapply(split(as.data.frame(x), cl$cluster), area.ellipse)
##         1         2         3         4         5 
## 0.5160641 0.6223894 0.3348524 0.2597876 0.4429267