Question

G'day All，

我有两组点数据，一组有24个位置，另一组有~16,000。我想计算从24点到16,000点的距离。使用R raster包中的pointDistance（）很容易做到这一点，但我无法分辨哪些对被比较。我想创建一个data.frame，它具有每个比较的位置名称，因此我可以将它与距离组合。

a <- data.frame('lon' = c(1,5,55,31), 'lat' = c(3,7,20,22), 'loc' = c('a', 'b', 'c', 'd'))
b <- data.frame('lon' = c(4,2,8,65), 'lat' = c(50,-90,20,32), 'loc' = c('e', 'f', 'g', 'h'))   
dist <- function(x, y){
 for( i in 1:length(a$lon)){
    my_vector <- vector(mode = "numeric", length = 0)
    d <- pointDistance(cbind(x[i,'lon'], x[i,'lat']), cbind(y$lon, y$lat), lonlat=TRUE)
    my_vector <- c(my_vector, d)
    }
    my_vector
 }

我早上很慢，无法弄清楚为什么我上面的功能没有输出每个组合。它没有将每次迭代的d添加到my_vector中。

与此同时，我想包括产生每个距离测量的位置的成对组合，例如：

  loc1   loc2   dist
     a      e     10
     a      f     16
     a      g     12
     a      h     19
     b      e     15
     b      f     17
     b      g     14
     b      h     13
     c      e     11
   etc    etc    etc

我很抱歉打扰你，任何帮助都会非常感激。

提前致谢。

亚当

Answer 1

每次迭代都会使用零长度向量覆盖结果。在循环外声明你的变量：

dist <- function(x, y){my_vector <- vector(mode = "numeric", length = 0)
 for( i in 1:length(a$lon)){
    d <- pointDistance(cbind(x[i,'lon'], x[i,'lat']), 
                       cbind(y$lon, y$lat), lonlat=TRUE)
    my_vector <- rbind(my_vector, d)
    }
    my_vector
 }
> dist(a,b)
     [,1]     [,2]    [,3]    [,4]
d 5239684 10352713 2039490 7401111
d 4787647 10797991 1482960 6785859
d 5571477 12245144 4899364 1666956
d 3909092 12467783 2398381 3534050

我认为rbind在这种情况下比c()更好“粘合剂”

Answer 2

plyr假设＆＃34; loc＆＃34;的值是独一无二的......

fdist <- function(p1,p2) { pointDistance(cbind(a$lon[a$loc==p1], a$lat[a$loc==p1]),
                                     cbind(b$lon[b$loc==p2], b$lat[b$loc==p2]), lonlat=TRUE)}
d <- ddply(expand.grid(a=a$loc,b=b$loc),.(a,b),mutate,dist=fdist(a,b))

产生：

   a b      dist
1  a e  47.09565
2  a f  93.00538
3  a g  18.38478
4  a h  70.26379
5  b e  43.01163
6  b f  97.04638
7  b g  13.34166
8  b h  65.00000
9  c e  59.16925
   etc...

值不同，因为我使用平坦的Pythagorian距离pointDistance。

在计算两个数据集之间的pointDistance时保持标识符？

2 个答案: