如何计算R中组内的欧几里德距离

时间:2017-08-10 17:26:29

标签: r euclidean-distance

如果我有这样的数据框:

ID GroupID X  Y
1   a      772.7778 226.5
1   a      806.5645 35.3871
1   a      925.5714 300.9286
1   b      708.0909 165.5455
1   b      630.8235 167.4118
2   a      555.3333 151.875
2   a      732.8947 462.3158

以下是我想要的结果:

ID GroupID X        Y        Distance
1   a      772.7778 226.5    NA
1   a      806.5645 35.3871  dist between((772.7778,226.5),(806.5645,35.3871))
1   a      925.5714 300.9286 dist between((925.5714,300.9286),(806.5645,35.3871))
1   b      708.0909 165.5455 NA
1   b      630.8235 167.4118 dist between((708.0909,165.5455),(630.8235,167.4118))
2   a      555.3333 151.875  NA
2   a      732.8947 462.3158 dist between((732.8947,462.3158),(555.3333,151.875))

基本上是ID和GroupID内的距离。这里的NA表示在每个子组中(例如,ID = 1; GroupID = a),第一距离是NA。有没有人可以帮助我?感谢!!!

2 个答案:

答案 0 :(得分:2)

之前从未使用过dist,但这里有一个for循环,可能适合您:

> for(i in 1:nrow(df)) {
  if(i > 1 && df$GroupID[i] == df$GroupID[i-1]) {
   df$Distance[i] <- sqrt(((df$X[i] - df$X[i-1]) ^ 2) + ((df$Y[i] - df$Y[i-1]) ^ 2))
  } else {
     df$Distance[i] <- NA
    }
  }

> df
  ID GroupID        X        Y  Distance
1  1       a 772.7778 226.5000        NA
2  1       a 806.5645  35.3871 194.07648
3  1       a 925.5714 300.9286 290.98957
4  1       b 708.0909 165.5455        NA
5  1       b 630.8235 167.4118  77.28994
6  2       a 555.3333 151.8750        NA
7  2       a 732.8947 462.3158 357.63325

答案 1 :(得分:1)

为什么不尝试这样的事情:

根据ID的组合拆分数据,应用距离函数,然后解压缩?

splitted <- split(dat[,c("X","Y")], paste(dat$ID,dat$GroupID))

distances <- lapply(splitted, function(x) {
 if(nrow(x) > 2){ # diag() is useless for <= 2x2 matrix
   c(NA,diag(as.matrix(dist(x))[,-1]))
 } else {
   c(NA,dist(x)[1])
 }
})

dat$distances <- unsplit(distances, paste(dat$ID,dat$GroupID))

dat
  ID GroupID        X        Y distances
1  1       a 772.7778 226.5000        NA
2  1       a 806.5645  35.3871 194.07648
3  1       a 925.5714 300.9286 290.98957
4  1       b 708.0909 165.5455        NA
5  1       b 630.8235 167.4118  77.28994
6  2       a 555.3333 151.8750        NA
7  2       a 732.8947 462.3158 357.63325

旁注:如果每组超过10k行,则dist会变慢。