Question

假设以下数据集（称为 Active）包含个人在几天内进入和离开房间的日期和时间记录。个人由唯一的 ID 表示，Start 是他们进入房间的时间，而 End 是他们离开房间的时间。

ID  Date      Start       End  
1 20-5-2021   13:00:00     13:03:00
1 20-5-2021   13:05:00     13:09:45
1 21-5-2021   12:01:00     13:00:00
2 20-5-2021   13:01:00     13:09:50
2 21-5-2021   13:15:05     14:00:00
2 21-5-2021   15:01:00     15:10:00

现在我们有另一个数据集（称为 Detections），其中包含每个人 (ID) 在房间中出现的时间：

ID   Date       Time 
1  20-5-2021    9:02:32
1  20-5-2021   11:02:32
1  20-5-2021   13:02:31
1  20-5-2021   13:08:00
1  20-5-2021   13:08:30
2  20-5-2021   12:07:09
2  20-5-2021   12:30:10
2  20-5-2021   13:07:09
2  21-5-2021   13:50:07
2  21-5-2021   13:51:56

请注意，在某些情况下，ID 出现在 Start 和 End 次之外的 Active。我们想要过滤 Detections 中的行，其中每个 ID 都出现在 Start 中指定的任何 End 和 Active 边界内。格式化 Date 和 Time 数据的最佳方法是什么，我们如何在 R 中应用这种过滤器？

最终结果将返回 Detections 中在 ID 中指定的 Start 和 End 边界内看到 Active 的行，如下所示：

1  20-5-2021   13:02:31
1  20-5-2021   13:08:00
1  20-5-2021   13:08:30
2  20-5-2021   13:07:09
2  21-5-2021   13:50:07
2  21-5-2021   13:51:56

Answer 1

结合日期和时间列创建日期时间并使用fuzzyjoin包加入范围

library(dplyr)
library(lubridate)
library(tidyr)

Active %>%
  mutate(Start = dmy_hms(paste(Date, Start)), 
         End = dmy_hms(paste(Date, End))) %>%
  select(-Date) %>%
  fuzzyjoin::fuzzy_inner_join(Detections %>%
  unite(Datetime, Date, Time, sep = ' ') %>%
  mutate(Datetime = dmy_hms(Datetime)), 
  by = c('ID', 'Start' = 'Datetime', 'End' = 'Datetime'), 
  match_fun = c(`==`, `<=`, `>=`))

#  ID.x               Start                 End ID.y            Datetime
#1    1 2021-05-20 13:00:00 2021-05-20 13:03:00    1 2021-05-20 13:02:31
#2    1 2021-05-20 13:05:00 2021-05-20 13:09:45    1 2021-05-20 13:08:00
#3    1 2021-05-20 13:05:00 2021-05-20 13:09:45    1 2021-05-20 13:08:30
#4    2 2021-05-20 13:01:00 2021-05-20 13:09:50    2 2021-05-20 13:07:09
#5    2 2021-05-21 13:15:05 2021-05-21 14:00:00    2 2021-05-21 13:50:07
#6    2 2021-05-21 13:15:05 2021-05-21 14:00:00    2 2021-05-21 13:51:56

如何根据另一个数据集中的值过滤一个数据集中的日期和时间

1 个答案: