我有一个以下格式的数据框,我试图找到事件'分配的'之间的时差。以及活动最后一次创建'它来到它之前。
**AccountID** **TIME** **EVENT**
1 2016-11-08T01:54:15.000Z CREATED
1 2016-11-09T01:54:15.000Z ASSIGNED
1 2016-11-10T01:54:15.000Z CREATED
1 2016-11-11T01:54:15.000Z CALLED
1 2016-11-12T01:54:15.000Z ASSIGNED
1 2016-11-12T01:54:15.000Z SLEEP
目前我的代码如下,我的难点是在ASSIGNED事件之前选择CREATED
test <- timetable.filter %>%
group_by(AccountID) %>%
mutate(timeToAssign = ifelse(EVENT == 'ASSIGNED',
interval(ymd_hms(TIME), max(ymd_hms(TIME[EVENT == 'CREATED']))) %/% hours(1), NA))
我正在寻找输出
**AccountID** **TIME** **EVENT** **timeToAssign**
1 2016-11-08T01:54:15.000Z CREATED NA
1 2016-11-09T01:54:15.000Z ASSIGNED 12
1 2016-11-10T01:54:15.000Z CREATED NA
1 2016-11-11T01:54:15.000Z CALLED NA
1 2016-11-12T01:54:15.000Z ASSIGNED 24
1 2016-11-12T01:54:15.000Z SLEEP NA
答案 0 :(得分:5)
dplyr
和tidyr
:
library(dplyr); library(tidyr); library(anytime)
df %>%
group_by(AccountID) %>%
mutate(CREATED_INDEX = if_else(EVENT == 'CREATED', row_number(), NA_integer_),
TIME = anytime(TIME)) %>%
fill(CREATED_INDEX) %>%
mutate(TimeToAssign = if_else(EVENT == 'ASSIGNED',
as.numeric(TIME - TIME[CREATED_INDEX], units = 'hours'),
NA_real_)) %>%
select(-CREATED_INDEX)
# A tibble: 6 x 4
# Groups: AccountID [1]
# AccountID TIME EVENT TimeToAssign
# <int> <dttm> <fctr> <dbl>
#1 1 2016-11-08 01:54:15 CREATED NA
#2 1 2016-11-09 01:54:15 ASSIGNED 24
#3 1 2016-11-10 01:54:15 CREATED NA
#4 1 2016-11-11 01:54:15 CALLED NA
#5 1 2016-11-12 01:54:15 ASSIGNED 48
#6 1 2016-11-12 01:54:15 SLEEP NA