删除不必要的观察值,并计算时差

时间:2018-11-30 09:10:27

标签: r data.table tidyverse reshape2

我有一个数据框,当审核id时,它可以在特定时间通过或失败。如何计算将状态从失败状态更改为通过状态的所有时间的总和。如果ID失败,失败,通过状态。我必须增加时间将状态从第一次失败更改为通过,第二次失败更改状态

time <- c("08-10-2018 08:36", "12-10-2018 07:53", "23-10-2018 23:09", "30-10-2018 18:24","07-11-2018 18:13","10-11-2018 05:47","19-11-2018 21:26","26-11-2018 14:04","16-10-2018 03:19","07-11-2018 19:00","09-11-2018 23:25","20-11-2018 19:24", "22-11-2018 01:12","28-11-2018 03:46","04-10-2018 15:05","15-10-2018 15:32","20-10-2018 06:21","26-10-2018 04:51","02-11-2018 00:28","09-11-2018 22:43","15-11-2018 22:39","21-11-2018 04:10","26-11-2018 13:29")
id <-c("A1","A1","A1","A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A2","A3","A3","A3","A3","A3","A3","A3","A3","A3")
status <- c("FAILED","PASSED","FAILED","PASSED","FAILED","PASSED","PASSED","PASSED","PASSED","FAILED","PASSED","FAILED","PASSED","PASSED","PASSED","FAILED","PASSED","PASSED","PASSED","FAILED","PASSED","PASSED","FAILED")

df <- data.frame(id, time, status)

必需格式:

ids <- c("A1","A2", "A3")
diff_time <- c(13.25, 3.46, 10.61)
df2 <- data.frame(ids,diff_time)   

预先感谢

1 个答案:

答案 0 :(得分:1)

如果我理解正确,则OP希望测量每个FAILED事件与下一个后续PASSED事件(对于每个id)之间的时间差。最后,需要将每个id的测量差异进行汇总。

这可以通过向后滚动联接来解决,它可以与data.table软件包一起使用。

首先,我们必须将FAILEDPASSED事件分开。然后,使用右连接以便为每个PASSED事件找到后续的FAILED事件。这两个子集在idtime上连接在一起,其中时间向后滚动(NOCB =下一个观测值向后移动)。

library(data.table)
# coerce df to data.table, coerce time to POSIXct
setDT(df)[, time := lubridate::dmy_hm(time)]
# create subset PASSED
dfp <- df[status == "PASSED"][, timep := time]
# create subset FAILED
dff <- df[status == "FAILED"][, timef := time]
# backward rolling join
dfp[dff, on = .(id, time),  roll = -Inf]
   id                time status               timep i.status               timef
1: A1 2018-10-08 08:36:00 PASSED 2018-10-12 07:53:00   FAILED 2018-10-08 08:36:00
2: A1 2018-10-23 23:09:00 PASSED 2018-10-30 18:24:00   FAILED 2018-10-23 23:09:00
3: A1 2018-11-07 18:13:00 PASSED 2018-11-10 05:47:00   FAILED 2018-11-07 18:13:00
4: A2 2018-11-07 19:00:00 PASSED 2018-11-09 23:25:00   FAILED 2018-11-07 19:00:00
5: A2 2018-11-20 19:24:00 PASSED 2018-11-22 01:12:00   FAILED 2018-11-20 19:24:00
6: A3 2018-10-15 15:32:00 PASSED 2018-10-20 06:21:00   FAILED 2018-10-15 15:32:00
7: A3 2018-11-09 22:43:00 PASSED 2018-11-15 22:39:00   FAILED 2018-11-09 22:43:00
8: A3 2018-11-26 13:29:00   <NA>                <NA>   FAILED 2018-11-26 13:29:00
# rolling join and aggregate by id
dfp[dff, on = .(id, time),  roll = -Inf][, .(diff_time = sum(timep - timef, na.rm = TRUE)), by = id]
   id      diff_time
1: A1 13.254167 days
2: A2  3.425694 days
3: A3 10.614583 days