我看过普利尔,但我想要实现的是与通常的
完全不同Time Criteria
17/05/2013 17:22 A
17/05/2013 17:23 A
17/05/2013 17:29 A
17/05/2013 17:22 B
17/05/2013 17:28 B
17/05/2013 17:29 B
25/05/2013 16:56 C
25/05/2013 16:56 C
我想按条件分割这些数据。然后,对于每个子集,迭代记录并决定是否保留该记录,如果每个记录距离最后一个记录少于5分钟。
期望的结果:
Time Criteria Keep
17/05/2013 17:22 A T
17/05/2013 17:23 A T
17/05/2013 17:29 A F --> 29 is more than 5 mins from 23
17/05/2013 17:22 B F --> Not keeping this because it is >5min from next record
17/05/2013 17:28 B T
17/05/2013 17:29 B T
25/05/2013 16:56 C T
25/05/2013 16:56 C T
Dput:
structure(list(Time = structure(c(1368782520, 1368782580, 1368782940,
1368782520, 1368782880, 1368782940, 1369472160, 1369472160), class = c("POSIXct",
"POSIXt"), tzone = "Singapore"), Criteria = structure(c(1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("Time",
"Criteria"), row.names = c(NA, -8L), class = "data.frame")
答案 0 :(得分:6)
这有效:
ddply(dat, "Criteria", transform,
Keep = c(FALSE, diff(Time) <= 5) |
c(diff(Time) <= 5, FALSE))
# Time Criteria Keep
# 1 2013-05-17 17:22:00 A TRUE
# 2 2013-05-17 17:23:00 A TRUE
# 3 2013-05-17 17:29:00 A FALSE
# 4 2013-05-17 17:22:00 B FALSE
# 5 2013-05-17 17:28:00 B TRUE
# 6 2013-05-17 17:29:00 B TRUE
# 7 2013-05-25 16:56:00 C TRUE
# 8 2013-05-25 16:56:00 C TRUE
我对于差异日期并不是非常熟悉,所以你可能必须小心并找出是否有办法让它系统地以分钟为单位返回时差(尽管在这个例子中就是这种情况。)