计算行与R中所有先前行之间的最小距离

时间:2019-03-02 21:56:54

标签: r

我想计算当前行与每个行之前的每一行之间的最小距离。我的数据框有几组,每组有多个经度和纬度的日期。我使用Haversine函数计算距离,并且需要如上所述应用此函数。数据框如下所示:

  grp    date    long lat rowid
1   1 1995-07-01   11  12     1
2   1 1995-07-05    3   0     2
3   1 1995-07-09   13   4     3
4   1 1995-07-13    4  25     4
5   2 1995-03-07   12   6     1
6   2 1995-03-10    3  27     2
7   2 1995-03-13   34   8     3
8   2 1995-03-16   25   9     4

我当前的尝试使用purrrlyr :: by_row,但是该方法太慢。实际上,每个组都有数千个日期和地理位置。这是我当前尝试的一部分:

calc_min_distance <- function(df, grp.name, row){
  df %>% 
    filter(
      group_name==grp.name
    ) %>% 
    filter(
      row_number() <= row
    ) %>% 
    mutate(
      last.lat = last(lat),
      last.long = last(long),
      rowid = 1:n()
    ) %>% 
    group_by(rowid) %>% 
    purrrlyr::by_row(
      ~haversinedistance.fnct(.$last.long, .$last.lat, .$long, .$lat),
      .collate='rows',
      .to = 'min.distance'
    ) %>% 
    filter(
      row_number() < n()
    ) %>% 
    summarise(
      min = min(min.distance)
    ) %>% 
    .$min
}

df_dist <-
  df %>% 
  group_by(grp_name) %>% 
  mutate(rowid = 1:n()) %>% 
  group_by(grp_name, rowid) %>% 
  purrrlyr::by_row(
    ~calc_min_distance(df, .$grp_name,.$rowid),
    .collate='rows',
    .to = 'min.distance'
  ) %>% 
  ungroup %>% 
  select(-rowid)

假设参考行的距离定义为(lat + long)-小于参考行的每个成对行的距离(lat + long)。我对grp 1的预期输出如下:

  grp       date long lat rowid min.distance
1   1 1995-07-01   11  12     1            0
2   1 1995-07-05    3   0     2          -20
3   1 1995-07-09   13   4     3           -6
4   1 1995-07-13    4  25     4            6

如何快速计算当前rowid与之前所有rowid之间的最小距离?

1 个答案:

答案 0 :(得分:0)

这就是我要做的。无论如何,您都需要计算组内所有对的距离,因此我们将使用 select distinct "uid", "username", ( select count(id) from games where state = 'finished' and user_uid = users.uid ) as games_hosted from "users" inner join "games" on "games"."user_uid" = "users"."uid" where "games"."state" in ('published', 'finished') and "username" < 'HariShankar' order by "username" desc limit 10 来完成此任务。我建议逐行逐步执行该功能,并查看它的作用,我认为这是有道理的。

geosphere::distm

使用此数据:

library(geosphere)
find_min_dist_above = function(long, lat, fun = distHaversine) {
  d = distm(x = cbind(long, lat), fun = fun)
  d[lower.tri(d, diag = TRUE)] = NA
  d[1, 1] = 0
  return(apply(d, MAR = 2, min, na.rm = TRUE))
}

df %>% group_by(grp) %>%
  mutate(min.distance = find_min_dist_above(long, lat))
# # A tibble: 8 x 6
# # Groups:   grp [2]
#     grp date        long   lat rowid min.distance
#   <int> <fct>      <int> <int> <int>        <dbl>
# 1     1 1995-07-01    11    12     1           0 
# 2     1 1995-07-05     3     0     2     1601842.
# 3     1 1995-07-09    13     4     3      917395.
# 4     1 1995-07-13     4    25     4     1623922.
# 5     2 1995-03-07    12     6     1           0 
# 6     2 1995-03-10     3    27     2     2524759.
# 7     2 1995-03-13    34     8     3     2440596.
# 8     2 1995-03-16    25     9     4      997069.