使用R中的lat / lon数据进行聚类

时间:2018-02-13 17:33:21

标签: r google-maps

我对R.很新。目前我正在使用纬度和经度数据进行聚类分析,然后在谷歌地图中绘制值。但我的数据点非常有限......只有20分。 根据我的知识,我想用k-means算法做距离并且为了距离计算目的我想使用Haversian距离(https://www.slideshare.net/AnbarasanS2/clusteranalysis-58192369).I也尝试基于密度的聚类但是给我非常差的结果。所以,我想留下使用k-means.My数据集和代码如下 -

1   27.9745 79.0028
2   29.4716 77.7642
3   30.9688 76.5256
4   29.4716 77.7642
5   29.4716 77.7642
6   29.4716 77.7642
7   29.4716 77.7642
8   25.5648 83.4477
9   26.2946 79.041
10  22.5293 77.178
11  26.2946 79.041
12  30.7896 76.4973
13  26.2946 79.041
14  28.1856 72.2447
15  28.1856 72.2447
16  28.1856 72.2447
17  28.1856 72.2447
18  28.1856 72.2447
19  28.1856 72.2447
20  28.1856 72.2447

代码是 -

geodata = read.csv('test.csv')

#K-means clustering
#Compute the distance matrix using Geosphere package.
geo.dist <- function(df) {
  require(geosphere)
  d <- function(i,z) {
    dist <-rep(0,nrow(z))
    dist[i:nrow(z)] <-
      distHaversine(z[i:nrow(z),1:2],z[i,1:2])
    return(dist)
  }
  dm <- do.call(cbind,lapply(1:nrow(df), d, df))
  return(as.dist(df))
}

distance.matrix <-geo.dist(geodata[,c(2,3)])

#Determine the no.of clusters
wssplot.distancematrix <- function(data, nc = 15, seed = 1234) {
  wss <-rep(0,15)
  for (i in 2:nc) {
    set.seed(seed)
    wss[i] <- sum(kmeans(data, centers = i)$withinss)
  }
  plot(1:nc,wss,
       type = "b")
}

wssplot.distancematrix(distance.matrix)

但得到了这个错误 -

  

dimnames(df)&lt; - if(is.null(labels))list(seq_len(size),   seq_len(size))else list(标签,'dimnames'的长度[1]不是   等于数组范围另外:警告消息:在df [row(df)&gt;   col(df)]&lt; - x:

显示追溯

重新运行Debug  dimnames(df)中的错误&lt; - if(is.null(labels))list(seq_len(size),seq_len(size))else list(labels,:   'dimnames'[1]的长度不等于数组范围

如何创建k-means聚类并在google map中绘制值。

提前致谢。

此致 尼基塔

1 个答案:

答案 0 :(得分:0)

代码中有两个错误。评论如下:

geo.dist <- function(df) {
  require(geosphere)
  d <- function(i,z) {
    dist <-rep(0,nrow(z))
    dist[i:nrow(z)] <-
      distHaversine(z[i:nrow(z),1:2],z[i,1:2])
    return(dist)
  }
  dm <- do.call(cbind,lapply(1:nrow(df), d, df))
  return(as.dist(dm)) # return should be dm not df
}

distance.matrix <-geo.dist(geodata[,c(2,3)])

#Determine the no.of clusters
wssplot.distancematrix <- function(data, nc = 8, seed = 1234) {
  wss <-rep(0,nc) # nc = 15 is too high, to many cluster centers
  for (i in 2:nc) {
    set.seed(seed)
    wss[i] <- sum(kmeans(data, centers = i)$withinss)
  }
  plot(1:nc,wss,
       type = "b")
}

wssplot.distancematrix(distance.matrix)

enter image description here