我对R.很新。目前我正在使用纬度和经度数据进行聚类分析,然后在谷歌地图中绘制值。但我的数据点非常有限......只有20分。 根据我的知识,我想用k-means算法做距离并且为了距离计算目的我想使用Haversian距离(https://www.slideshare.net/AnbarasanS2/clusteranalysis-58192369).I也尝试基于密度的聚类但是给我非常差的结果。所以,我想留下使用k-means.My数据集和代码如下 -
1 27.9745 79.0028
2 29.4716 77.7642
3 30.9688 76.5256
4 29.4716 77.7642
5 29.4716 77.7642
6 29.4716 77.7642
7 29.4716 77.7642
8 25.5648 83.4477
9 26.2946 79.041
10 22.5293 77.178
11 26.2946 79.041
12 30.7896 76.4973
13 26.2946 79.041
14 28.1856 72.2447
15 28.1856 72.2447
16 28.1856 72.2447
17 28.1856 72.2447
18 28.1856 72.2447
19 28.1856 72.2447
20 28.1856 72.2447
代码是 -
geodata = read.csv('test.csv')
#K-means clustering
#Compute the distance matrix using Geosphere package.
geo.dist <- function(df) {
require(geosphere)
d <- function(i,z) {
dist <-rep(0,nrow(z))
dist[i:nrow(z)] <-
distHaversine(z[i:nrow(z),1:2],z[i,1:2])
return(dist)
}
dm <- do.call(cbind,lapply(1:nrow(df), d, df))
return(as.dist(df))
}
distance.matrix <-geo.dist(geodata[,c(2,3)])
#Determine the no.of clusters
wssplot.distancematrix <- function(data, nc = 15, seed = 1234) {
wss <-rep(0,15)
for (i in 2:nc) {
set.seed(seed)
wss[i] <- sum(kmeans(data, centers = i)$withinss)
}
plot(1:nc,wss,
type = "b")
}
wssplot.distancematrix(distance.matrix)
但得到了这个错误 -
dimnames(df)&lt; - if(is.null(labels))list(seq_len(size), seq_len(size))else list(标签,'dimnames'的长度[1]不是 等于数组范围另外:警告消息:在df [row(df)&gt; col(df)]&lt; - x:
显示追溯
重新运行Debug dimnames(df)中的错误&lt; - if(is.null(labels))list(seq_len(size),seq_len(size))else list(labels,: 'dimnames'[1]的长度不等于数组范围
如何创建k-means聚类并在google map中绘制值。
提前致谢。
此致 尼基塔
答案 0 :(得分:0)
代码中有两个错误。评论如下:
geo.dist <- function(df) {
require(geosphere)
d <- function(i,z) {
dist <-rep(0,nrow(z))
dist[i:nrow(z)] <-
distHaversine(z[i:nrow(z),1:2],z[i,1:2])
return(dist)
}
dm <- do.call(cbind,lapply(1:nrow(df), d, df))
return(as.dist(dm)) # return should be dm not df
}
distance.matrix <-geo.dist(geodata[,c(2,3)])
#Determine the no.of clusters
wssplot.distancematrix <- function(data, nc = 8, seed = 1234) {
wss <-rep(0,nc) # nc = 15 is too high, to many cluster centers
for (i in 2:nc) {
set.seed(seed)
wss[i] <- sum(kmeans(data, centers = i)$withinss)
}
plot(1:nc,wss,
type = "b")
}
wssplot.distancematrix(distance.matrix)