我有一个数据框,当审核id时,它可以在特定时间通过或失败。如何计算将状态从失败状态更改为通过状态的所有时间的总和。如果ID失败,失败,通过状态。我必须增加时间将状态从第一次失败更改为通过,第二次失败更改状态
time <- c("08-10-2018 08:36", "12-10-2018 07:53", "23-10-2018 23:09", "30-10-2018 18:24","07-11-2018 18:13","10-11-2018 05:47","19-11-2018 21:26","26-11-2018 14:04","16-10-2018 03:19","07-11-2018 19:00","09-11-2018 23:25","20-11-2018 19:24", "22-11-2018 01:12","28-11-2018 03:46","04-10-2018 15:05","15-10-2018 15:32","20-10-2018 06:21","26-10-2018 04:51","02-11-2018 00:28","09-11-2018 22:43","15-11-2018 22:39","21-11-2018 04:10","26-11-2018 13:29")
id <-c("A1","A1","A1","A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A2","A3","A3","A3","A3","A3","A3","A3","A3","A3")
status <- c("FAILED","PASSED","FAILED","PASSED","FAILED","PASSED","PASSED","PASSED","PASSED","FAILED","PASSED","FAILED","PASSED","PASSED","PASSED","FAILED","PASSED","PASSED","PASSED","FAILED","PASSED","PASSED","FAILED")
df <- data.frame(id, time, status)
必需格式:
ids <- c("A1","A2", "A3")
diff_time <- c(13.25, 3.46, 10.61)
df2 <- data.frame(ids,diff_time)
预先感谢
答案 0 :(得分:1)
如果我理解正确,则OP希望测量每个FAILED
事件与下一个后续PASSED
事件(对于每个id
)之间的时间差。最后,需要将每个id
的测量差异进行汇总。
这可以通过向后滚动联接来解决,它可以与data.table
软件包一起使用。
首先,我们必须将FAILED
和PASSED
事件分开。然后,使用右连接以便为每个PASSED
事件找到后续的FAILED
事件。这两个子集在id
和time
上连接在一起,其中时间向后滚动(NOCB =下一个观测值向后移动)。
library(data.table)
# coerce df to data.table, coerce time to POSIXct
setDT(df)[, time := lubridate::dmy_hm(time)]
# create subset PASSED
dfp <- df[status == "PASSED"][, timep := time]
# create subset FAILED
dff <- df[status == "FAILED"][, timef := time]
# backward rolling join
dfp[dff, on = .(id, time), roll = -Inf]
id time status timep i.status timef 1: A1 2018-10-08 08:36:00 PASSED 2018-10-12 07:53:00 FAILED 2018-10-08 08:36:00 2: A1 2018-10-23 23:09:00 PASSED 2018-10-30 18:24:00 FAILED 2018-10-23 23:09:00 3: A1 2018-11-07 18:13:00 PASSED 2018-11-10 05:47:00 FAILED 2018-11-07 18:13:00 4: A2 2018-11-07 19:00:00 PASSED 2018-11-09 23:25:00 FAILED 2018-11-07 19:00:00 5: A2 2018-11-20 19:24:00 PASSED 2018-11-22 01:12:00 FAILED 2018-11-20 19:24:00 6: A3 2018-10-15 15:32:00 PASSED 2018-10-20 06:21:00 FAILED 2018-10-15 15:32:00 7: A3 2018-11-09 22:43:00 PASSED 2018-11-15 22:39:00 FAILED 2018-11-09 22:43:00 8: A3 2018-11-26 13:29:00 <NA> <NA> FAILED 2018-11-26 13:29:00
# rolling join and aggregate by id
dfp[dff, on = .(id, time), roll = -Inf][, .(diff_time = sum(timep - timef, na.rm = TRUE)), by = id]
id diff_time 1: A1 13.254167 days 2: A2 3.425694 days 3: A3 10.614583 days