不使用for循环,因为它处理大数据集的速度很慢
A B
1 2
-1 4
9 5
......
我想彼此计算每一行的距离并将结果填充到一个矩阵中(就像dist所做的一样,但不是正常的距离方法),但是该距离是由我在不同情况下定义的,因此我无法使用基本的dist函数(仅提供固定的指标集)
答案 0 :(得分:1)
outer
函数提供了一种创建自定义距离矩阵的相当简单的方法,尽管由于数据不只是一维,您可能需要传递行索引而不是数据。解决方案如下:
df <- data.frame(A=c(1,-1,9), B=c(2,4,5)) ##Your data
f <- function(i,j){ ##Some distance function
(df$A[i]-df$A[j])^4 + (df$B[i]-df$B[j])^4
}
outer(seq_along(df$A),seq_along(df$A), f)
# [,1] [,2] [,3]
#[1,] 0 32 4177
#[2,] 32 0 10001
#[3,] 4177 10001 0
请注意,函数f
必须向量化,即i
和j
的长度大于1时才能工作
答案 1 :(得分:0)
我解决了这个问题,并将答案发布在这里,以防将来有人需要:
#this create a data set filled with longitude and latitude of some place
temp = structure(list(lon = c(105.948347, 105.956001, 105.9358872, 105.930676,
105.9300467, 105.933841, 105.958083, 105.947358, 105.9487254,
105.9471336, 105.948002, 105.9558502, 105.95117, 105.952783,
105.950688, 105.9441403, 105.944914, 105.9429264, 105.9388434,
105.938816), lat = c(26.236853, 26.249777, 26.240596, 26.240516,
26.2438934, 26.245372, 26.242305, 26.244994, 26.2469876, 26.2469411,
26.2369, 26.2497956, 26.249936, 26.250501, 26.250288, 26.2488675,
26.250295, 26.2485741, 26.2379629, 26.246864)), .Names = c("lon",
"lat"), row.names = c(NA, -20L), class = "data.frame")
caldistMatrix = function(G,f = distHaversine){
# No need to define G,f since that is in the closure when the func is defined
calelementdist = function(i,j){
result = f(p1 = c(G[i,1],G[i,2]),p2 = c(G[j,1],G[j,2]))
return(result)
}
D = outer(seq_along(G[,1]),seq_along(G[,2]),Vectorize(calelementdist))
return(D)
}
caldistMatrix_test = function(G,f = distHaversine){
D = outer(1:nrow(G),1:nrow(G),Vectorize(function(x,y) return(0)))
# D = matrix(,nrow = nrow(G),ncol = nrow(G))
for(i in 1:nrow(G)){
for(j in 1:nrow(G)){
D[i,j] = f(p1 = G[i,],p2 =G[j,] )
}
}
return(D)
}
system.time(caldistMatrix_test(temp))
user system elapsed
0.36 0.00 0.40
system.time(caldistMatrix(temp))
user system elapsed
0.14 0.00 0.14