我有两个大数据集,一个名为Shifts
,其中包含个人(ID
)在轮班中轮换时的开始和结束时间。数据结构的一个小例子:
> head(Shifts, 15)
ID Shift Rotation Start End
1 A S1 1 2017-04-23 00:05:58 2017-04-23 00:24:40
2 A S2 2 2017-04-23 00:00:00 2017-04-23 00:10:08
3 A S2 3 2017-04-23 00:15:13 2017-04-23 00:27:32
4 A S3 4 2017-04-23 00:00:00 2017-04-23 00:20:43
5 A S3 5 2017-04-23 00:27:49 2017-04-23 00:33:28
6 A S4 6 2017-04-23 00:04:26 2017-04-23 00:31:37
7 B S1 1 2017-04-23 00:00:00 2017-04-23 00:11:56
8 B S1 2 2017-04-23 00:13:42 2017-04-23 00:29:10
9 B S2 3 2017-04-23 00:03:38 2017-04-23 00:24:28
10 B S3 4 2017-04-23 00:00:00 2017-04-23 00:27:36
11 B S3 5 2017-04-23 00:31:08 2017-04-23 00:33:28
12 B S4 6 2017-04-23 00:00:01 2017-04-23 00:14:26
13 B S4 7 2017-04-23 00:18:32 2017-04-23 00:31:37
14 C S1 1 2017-04-23 00:00:00 2017-04-23 00:29:10
15 C S2 2 2017-04-23 00:00:00 2017-04-23 00:19:28
其他数据集(Activities)
包含个人Symbol
在每个班次完成的带时间戳的工作活动((ID)
)。此数据集的一个小示例:
> head(Activity, 10)
ID Symbol Shift Time
1 B TE S1 2017-04-23 00:00:22
2 B TI S1 2017-04-23 00:00:24
3 C TE S1 2017-04-23 00:01:08
4 A TE S1 2017-04-23 00:06:08
5 B TE S1 2017-04-23 00:01:25
6 B P S1 2017-04-23 00:01:33
7 C P S1 2017-04-23 00:01:36
8 C T S1 2017-04-23 00:01:36
9 A T S1 2017-04-23 00:07:45
10 A T S1 2017-04-23 00:08:25
对于每个班次的每个ID,我现在希望在时间间隔Activities$Time
和Shifts$Start
内查看Shifts$End
然后返回相应的Shift$Rotation
列。我的预期输出是:
> head(Activity, 10)
ID Symbol Shift Time Rotation
1 B TE S1 2017-04-23 00:00:22 1
2 B TI S1 2017-04-23 00:00:24 1
3 C TE S1 2017-04-23 00:01:08 1
4 A TE S1 2017-04-23 00:06:08 1
5 B TE S1 2017-04-23 00:01:25 1
6 B P S1 2017-04-23 00:01:33 1
7 C P S1 2017-04-23 00:01:36 1
8 C T S1 2017-04-23 00:01:36 1
9 A T S1 2017-04-23 00:07:45 1
10 A T S1 2017-04-23 00:08:25 1
由于两个数据集都非常大,有很多ID,移位和旋转,是否有一种快速的方法来查找并按照上述方式返回此列?
谢谢。
答案 0 :(得分:1)
以下内容如何:
library(tidyverse)
library(lubridate)
Activity <- inner_join(Shifts, Activities, by = c("ID", "Shift")) %>%
mutate(
temp = (Start < Time) * (Time < End)
) %>%
filter(temp == 1) %>%
select(ID, Symbol, Shift, Time, Rotation)