如果我有这样的数据框:
ID GroupID X Y
1 a 772.7778 226.5
1 a 806.5645 35.3871
1 a 925.5714 300.9286
1 b 708.0909 165.5455
1 b 630.8235 167.4118
2 a 555.3333 151.875
2 a 732.8947 462.3158
以下是我想要的结果:
ID GroupID X Y Distance
1 a 772.7778 226.5 NA
1 a 806.5645 35.3871 dist between((772.7778,226.5),(806.5645,35.3871))
1 a 925.5714 300.9286 dist between((925.5714,300.9286),(806.5645,35.3871))
1 b 708.0909 165.5455 NA
1 b 630.8235 167.4118 dist between((708.0909,165.5455),(630.8235,167.4118))
2 a 555.3333 151.875 NA
2 a 732.8947 462.3158 dist between((732.8947,462.3158),(555.3333,151.875))
基本上是ID和GroupID内的距离。这里的NA表示在每个子组中(例如,ID = 1; GroupID = a),第一距离是NA。有没有人可以帮助我?感谢!!!
答案 0 :(得分:2)
之前从未使用过dist
,但这里有一个for
循环,可能适合您:
> for(i in 1:nrow(df)) {
if(i > 1 && df$GroupID[i] == df$GroupID[i-1]) {
df$Distance[i] <- sqrt(((df$X[i] - df$X[i-1]) ^ 2) + ((df$Y[i] - df$Y[i-1]) ^ 2))
} else {
df$Distance[i] <- NA
}
}
> df
ID GroupID X Y Distance
1 1 a 772.7778 226.5000 NA
2 1 a 806.5645 35.3871 194.07648
3 1 a 925.5714 300.9286 290.98957
4 1 b 708.0909 165.5455 NA
5 1 b 630.8235 167.4118 77.28994
6 2 a 555.3333 151.8750 NA
7 2 a 732.8947 462.3158 357.63325
答案 1 :(得分:1)
为什么不尝试这样的事情:
根据ID的组合拆分数据,应用距离函数,然后解压缩?
splitted <- split(dat[,c("X","Y")], paste(dat$ID,dat$GroupID))
distances <- lapply(splitted, function(x) {
if(nrow(x) > 2){ # diag() is useless for <= 2x2 matrix
c(NA,diag(as.matrix(dist(x))[,-1]))
} else {
c(NA,dist(x)[1])
}
})
dat$distances <- unsplit(distances, paste(dat$ID,dat$GroupID))
dat
ID GroupID X Y distances 1 1 a 772.7778 226.5000 NA 2 1 a 806.5645 35.3871 194.07648 3 1 a 925.5714 300.9286 290.98957 4 1 b 708.0909 165.5455 NA 5 1 b 630.8235 167.4118 77.28994 6 2 a 555.3333 151.8750 NA 7 2 a 732.8947 462.3158 357.63325
旁注:如果每组超过10k行,则dist会变慢。