我想计算当前行与每个行之前的每一行之间的最小距离。我的数据框有几组,每组有多个经度和纬度的日期。我使用Haversine函数计算距离,并且需要如上所述应用此函数。数据框如下所示:
grp date long lat rowid
1 1 1995-07-01 11 12 1
2 1 1995-07-05 3 0 2
3 1 1995-07-09 13 4 3
4 1 1995-07-13 4 25 4
5 2 1995-03-07 12 6 1
6 2 1995-03-10 3 27 2
7 2 1995-03-13 34 8 3
8 2 1995-03-16 25 9 4
我当前的尝试使用purrrlyr :: by_row,但是该方法太慢。实际上,每个组都有数千个日期和地理位置。这是我当前尝试的一部分:
calc_min_distance <- function(df, grp.name, row){
df %>%
filter(
group_name==grp.name
) %>%
filter(
row_number() <= row
) %>%
mutate(
last.lat = last(lat),
last.long = last(long),
rowid = 1:n()
) %>%
group_by(rowid) %>%
purrrlyr::by_row(
~haversinedistance.fnct(.$last.long, .$last.lat, .$long, .$lat),
.collate='rows',
.to = 'min.distance'
) %>%
filter(
row_number() < n()
) %>%
summarise(
min = min(min.distance)
) %>%
.$min
}
df_dist <-
df %>%
group_by(grp_name) %>%
mutate(rowid = 1:n()) %>%
group_by(grp_name, rowid) %>%
purrrlyr::by_row(
~calc_min_distance(df, .$grp_name,.$rowid),
.collate='rows',
.to = 'min.distance'
) %>%
ungroup %>%
select(-rowid)
假设参考行的距离定义为(lat + long)-小于参考行的每个成对行的距离(lat + long)。我对grp 1的预期输出如下:
grp date long lat rowid min.distance
1 1 1995-07-01 11 12 1 0
2 1 1995-07-05 3 0 2 -20
3 1 1995-07-09 13 4 3 -6
4 1 1995-07-13 4 25 4 6
如何快速计算当前rowid与之前所有rowid之间的最小距离?
答案 0 :(得分:0)
这就是我要做的。无论如何,您都需要计算组内所有对的距离,因此我们将使用 select
distinct "uid",
"username",
(
select
count(id)
from
games
where
state = 'finished'
and user_uid = users.uid
) as games_hosted
from
"users"
inner join "games" on "games"."user_uid" = "users"."uid"
where
"games"."state" in ('published', 'finished')
and "username" < 'HariShankar'
order by
"username" desc
limit
10
来完成此任务。我建议逐行逐步执行该功能,并查看它的作用,我认为这是有道理的。
geosphere::distm
使用此数据:
library(geosphere)
find_min_dist_above = function(long, lat, fun = distHaversine) {
d = distm(x = cbind(long, lat), fun = fun)
d[lower.tri(d, diag = TRUE)] = NA
d[1, 1] = 0
return(apply(d, MAR = 2, min, na.rm = TRUE))
}
df %>% group_by(grp) %>%
mutate(min.distance = find_min_dist_above(long, lat))
# # A tibble: 8 x 6
# # Groups: grp [2]
# grp date long lat rowid min.distance
# <int> <fct> <int> <int> <int> <dbl>
# 1 1 1995-07-01 11 12 1 0
# 2 1 1995-07-05 3 0 2 1601842.
# 3 1 1995-07-09 13 4 3 917395.
# 4 1 1995-07-13 4 25 4 1623922.
# 5 2 1995-03-07 12 6 1 0
# 6 2 1995-03-10 3 27 2 2524759.
# 7 2 1995-03-13 34 8 3 2440596.
# 8 2 1995-03-16 25 9 4 997069.