如何在一段时间内查找时间并返回另一列值?

时间:2017-04-23 07:10:14

标签: r

我有两个大数据集,一个名为Shifts,其中包含个人(ID)在轮班中轮换时的开始和结束时间。数据结构的一个小例子:

> head(Shifts, 15)
     ID     Shift    Rotation     Start                 End
1       A      S1        1 2017-04-23 00:05:58 2017-04-23 00:24:40
2       A      S2        2 2017-04-23 00:00:00 2017-04-23 00:10:08
3       A      S2        3 2017-04-23 00:15:13 2017-04-23 00:27:32
4       A      S3        4 2017-04-23 00:00:00 2017-04-23 00:20:43
5       A      S3        5 2017-04-23 00:27:49 2017-04-23 00:33:28
6       A      S4        6 2017-04-23 00:04:26 2017-04-23 00:31:37
7       B      S1        1 2017-04-23 00:00:00 2017-04-23 00:11:56
8       B      S1        2 2017-04-23 00:13:42 2017-04-23 00:29:10
9       B      S2        3 2017-04-23 00:03:38 2017-04-23 00:24:28
10      B      S3        4 2017-04-23 00:00:00 2017-04-23 00:27:36
11      B      S3        5 2017-04-23 00:31:08 2017-04-23 00:33:28
12      B      S4        6 2017-04-23 00:00:01 2017-04-23 00:14:26
13      B      S4        7 2017-04-23 00:18:32 2017-04-23 00:31:37
14      C      S1        1 2017-04-23 00:00:00 2017-04-23 00:29:10
15      C      S2        2 2017-04-23 00:00:00 2017-04-23 00:19:28

其他数据集(Activities)包含个人Symbol在每个班次完成的带时间戳的工作活动((ID))。此数据集的一个小示例:

  > head(Activity, 10)
   ID    Symbol Shift         Time
1  B     TE      S1 2017-04-23 00:00:22
2  B     TI      S1 2017-04-23 00:00:24
3  C     TE      S1 2017-04-23 00:01:08
4  A     TE      S1 2017-04-23 00:06:08
5  B     TE      S1 2017-04-23 00:01:25
6  B      P      S1 2017-04-23 00:01:33
7  C      P      S1 2017-04-23 00:01:36
8  C      T      S1 2017-04-23 00:01:36
9  A      T      S1 2017-04-23 00:07:45
10 A      T      S1 2017-04-23 00:08:25

对于每个班次的每个ID,我现在希望在时间间隔Activities$TimeShifts$Start内查看Shifts$End然后返回相应的Shift$Rotation列。我的预期输出是:

  > head(Activity, 10)
   ID    Symbol Shift         Time       Rotation
1  B     TE      S1 2017-04-23 00:00:22       1 
2  B     TI      S1 2017-04-23 00:00:24       1
3  C     TE      S1 2017-04-23 00:01:08       1 
4  A     TE      S1 2017-04-23 00:06:08       1 
5  B     TE      S1 2017-04-23 00:01:25       1
6  B      P      S1 2017-04-23 00:01:33       1
7  C      P      S1 2017-04-23 00:01:36       1
8  C      T      S1 2017-04-23 00:01:36       1
9  A      T      S1 2017-04-23 00:07:45       1
10 A      T      S1 2017-04-23 00:08:25       1

由于两个数据集都非常大,有很多ID,移位和旋转,是否有一种快速的方法来查找并按照上述方式返回此列?

谢谢。

1 个答案:

答案 0 :(得分:1)

以下内容如何:

library(tidyverse)
library(lubridate)

Activity <- inner_join(Shifts, Activities, by = c("ID", "Shift")) %>%
  mutate(
    temp = (Start < Time) * (Time < End)
  ) %>%
  filter(temp == 1) %>%
  select(ID, Symbol, Shift, Time, Rotation)