Assessing if points exhibit spatial clustering (using R)

时间:2018-04-20 21:21:50

标签: r cluster-analysis distribution spatial hierarchical-clustering

This is a bit of a theoretical and practical question at the same time. I have a df containing x, y, z coordinates of a list of points. These points are dots on a 3D surface generated by image segmentation. The question I am trying to address is if these points are randomly distributed on this surface or if they exhibit some clustering. I'm testing this in R.

The first method I am using is kmeans. I ask the computer to determine the best (if any) number of groups these data can be made to fit into. I am using this piece of code. It tests 30 different indices (various methods) and outputs the best number of clusters

library("NbClust")
nb <- NbClust(df, distance = "euclidean", min.nc = 2,
          max.nc = 10, method = "kmeans")
library("factoextra")
fviz_nbclust(nb)

This code comes from http://www.sthda.com/english/articles/29-cluster-validation-essentials/96-determining-the-optimal-number-of-clusters-3-must-know-methods/

I get a certain number of clusters, which I guess is an indication of clustering in the first place. However, I would like to calculate a metric out of it? Suggestions on how to?

In addition I am also checking for clustering via histograms.

     df_mat <- df %>% as.matrix()
     dist_df <- dist(df_mat)
     hist(dist_df)

You would expect multiple peaks for clustering, one single peak for more or less random distributions perhaps.

Another approach I am trying is hierarchical clustering

 my_hclustdf <- hclust(dist_df)
 plot(my_hclustdf)

However, the output, a dendogram, itself does not tell me much.

Any suggestion would be greatly appreciated. Many thanks

1 个答案:

答案 0 :(得分:0)

  

随机分布在这个表面上或者如果它们表现出一些聚类

问题是这太模糊了。什么是随机分发的&#39;什么是一些聚类&#39;?

有一些工具可以测试这种情况。例如,Hopkins统计量可用于测试分布是否均匀随机。但缺乏统一的随机分布并不意味着存在集群 - 它并不均匀。类似的问题适用于k-means:仅仅因为某些启发式方法告诉你使用k = 3并不能证明有三个集群。即使在均匀随机数据中也可能表明这一点。如果你告诉k-means找到k个簇,那么它会找到k&#34;簇&#34;。即使是统一的随机数据。

您可能想要的是找到多个 - 单独的 - 密度模式