计算距离矩阵（非欧几里得）并且不使用for循环

时间：2018-10-30 10:28:46

标签： r distance

不使用for循环，因为它处理大数据集的速度很慢

 A    B
 1    2
-1    4
 9    5
......

我想彼此计算每一行的距离并将结果填充到一个矩阵中（就像dist所做的一样，但不是正常的距离方法），但是该距离是由我在不同情况下定义的，因此我无法使用基本的dist函数（仅提供固定的指标集）

2 个答案:

答案 0 :(得分：1)

outer函数提供了一种创建自定义距离矩阵的相当简单的方法，尽管由于数据不只是一维，您可能需要传递行索引而不是数据。解决方案如下：

df <- data.frame(A=c(1,-1,9), B=c(2,4,5)) ##Your data

f <- function(i,j){  ##Some distance function
  (df$A[i]-df$A[j])^4 + (df$B[i]-df$B[j])^4   
}

outer(seq_along(df$A),seq_along(df$A), f)
#     [,1]  [,2]  [,3]
#[1,]    0    32  4177
#[2,]   32     0 10001
#[3,] 4177 10001     0

请注意，函数f必须向量化，即i和j的长度大于1时才能工作

答案 1 :(得分：0)

我解决了这个问题，并将答案发布在这里，以防将来有人需要：

#this create a data set filled with longitude and latitude of some place
temp = structure(list(lon = c(105.948347, 105.956001, 105.9358872, 105.930676, 
105.9300467, 105.933841, 105.958083, 105.947358, 105.9487254, 
105.9471336, 105.948002, 105.9558502, 105.95117, 105.952783, 
105.950688, 105.9441403, 105.944914, 105.9429264, 105.9388434, 
105.938816), lat = c(26.236853, 26.249777, 26.240596, 26.240516, 
26.2438934, 26.245372, 26.242305, 26.244994, 26.2469876, 26.2469411, 
26.2369, 26.2497956, 26.249936, 26.250501, 26.250288, 26.2488675, 
26.250295, 26.2485741, 26.2379629, 26.246864)), .Names = c("lon", 
"lat"), row.names = c(NA, -20L), class = "data.frame")

最终解决方案：

caldistMatrix = function(G,f = distHaversine){
# No need to define G,f since that is in the closure when the func is defined
    calelementdist = function(i,j){
      result = f(p1 = c(G[i,1],G[i,2]),p2 = c(G[j,1],G[j,2]))
      return(result)
    }
  D = outer(seq_along(G[,1]),seq_along(G[,2]),Vectorize(calelementdist))
  return(D)
}

循环解决方案（缓慢）：

caldistMatrix_test = function(G,f = distHaversine){
  D = outer(1:nrow(G),1:nrow(G),Vectorize(function(x,y) return(0)))
  # D = matrix(,nrow = nrow(G),ncol = nrow(G))
  for(i in 1:nrow(G)){
    for(j in 1:nrow(G)){
      D[i,j] = f(p1 = G[i,],p2 =G[j,] )
    }
  }
   return(D)
}

比较：

 system.time(caldistMatrix_test(temp))
 user  system elapsed 
 0.36    0.00    0.40 
 system.time(caldistMatrix(temp))
 user  system elapsed 
 0.14    0.00    0.14