查找由相同行数分隔的同一列中的日期时间之间的差异

时间:2014-07-22 03:06:39

标签: r diff data.table

我目前正在检查多个主题( id )以及他们在一段时间内访问特定位置(位置)的次数。我们正在利用简单的运动检测来增加我们的覆盖范围,而不是在视觉上识别每个主体何时到达某个位置并记录日期时间( datetime )。不幸的是,其中一些技术可以记录错误的检测结果。这会让它看起来好像是一个主体,当它确实不存在时。

为了自信地假设受试者确实访问了该位置,制造商建议每30分钟必须至少 3次录音。下面的df data.table / data.frame就是一个例子:

> df <- data.table(df, key = c("id", "location", "datetime"))
> df
    id            datetime location
 1:  1 2014-06-01 08:03:00        a
 2:  1 2014-06-01 08:56:00        a
 3:  1 2014-06-01 08:58:00        a
 4:  1 2014-06-01 09:09:00        a
 5:  1 2014-06-01 09:20:00        a
 6:  1 2014-06-01 08:28:00        b
 7:  1 2014-06-01 08:33:00        b
 8:  1 2014-06-01 08:38:00        b
 9:  1 2014-06-01 08:42:00        b
10:  1 2014-06-01 09:31:00        b
11:  1 2014-06-01 08:18:00        c
12:  1 2014-06-01 08:50:00        c
13:  1 2014-06-01 08:52:00        c
14:  1 2014-06-01 08:53:00        c
15:  1 2014-06-01 09:05:00        c
16:  2 2014-06-01 09:35:00        a
17:  2 2014-06-01 09:45:00        a
18:  2 2014-06-01 10:40:00        a
19:  2 2014-06-01 10:44:00        a
20:  2 2014-06-01 10:59:00        a
21:  2 2014-06-01 11:04:00        a
22:  2 2014-06-01 09:54:00        b
23:  2 2014-06-01 10:12:00        b
24:  2 2014-06-01 09:40:00        c
25:  2 2014-06-01 10:01:00        c
26:  2 2014-06-01 10:07:00        c
27:  2 2014-06-01 10:19:00        c
28:  2 2014-06-01 10:32:00        c
29:  2 2014-06-01 10:49:00        c
30:  2 2014-06-01 10:57:00        c

上面使用的密钥按主题( id ),访问过的位置(位置)以及他们访问位置的时间(日期时间)。通过以这种方式组织data.table,所有需要做的是确定3次连续录制之间的时间是否超过30分钟。我想要的输出如下:

> df
    id            datetime location diff_min
 1:  1 2014-06-01 08:03:00        a       55
 2:  1 2014-06-01 08:56:00        a       13
 3:  1 2014-06-01 08:58:00        a       22
 4:  1 2014-06-01 09:09:00        a       NA  <-----
 5:  1 2014-06-01 09:20:00        a       NA  <-----
 6:  1 2014-06-01 08:28:00        b       10
 7:  1 2014-06-01 08:33:00        b        9
 8:  1 2014-06-01 08:38:00        b       53
 9:  1 2014-06-01 08:42:00        b       NA  <-----
10:  1 2014-06-01 09:31:00        b       NA  <-----
11:  1 2014-06-01 08:18:00        c       34
12:  1 2014-06-01 08:50:00        c        3
13:  1 2014-06-01 08:52:00        c       13
14:  1 2014-06-01 08:53:00        c       NA  <-----
15:  1 2014-06-01 09:05:00        c       NA  <-----
16:  2 2014-06-01 09:35:00        a       65
17:  2 2014-06-01 09:45:00        a       59
18:  2 2014-06-01 10:40:00        a       19
19:  2 2014-06-01 10:44:00        a       20
20:  2 2014-06-01 10:59:00        a       NA  <-----
21:  2 2014-06-01 11:04:00        a       NA  <-----
22:  2 2014-06-01 09:54:00        b       NA  <-----
23:  2 2014-06-01 10:12:00        b       NA  <-----
24:  2 2014-06-01 09:40:00        c       27
25:  2 2014-06-01 10:01:00        c       18
26:  2 2014-06-01 10:07:00        c       25
27:  2 2014-06-01 10:19:00        c       30
28:  2 2014-06-01 10:32:00        c       25
29:  2 2014-06-01 10:49:00        c       NA  <-----
30:  2 2014-06-01 10:57:00        c       NA  <-----

请注意指出<-----值的NA。由于我从初始值(总共3个记录)中找到difftime()两行,因此每个 id location 的最后两行/记录将为{ {1}}因为剩下的录音少于3张。记录为2或更少的任何位置都会自动获得NA个值。

我尝试使用以下代码自行解决这个问题,但我没有接近解决它:

NA

如果您想尝试一下,请参阅下面的> df[, diff_min := lapply(.SD, function(x) c(difftime(x[3:length(x)], x[1:(length(x)-2)], units = "mins"), NA, NA)), + .SDcols = "datetime", by = c("id", "location")] Warning message: In `[.data.table`(df, , `:=`(diff_min, lapply(.SD, function(x) c(difftime(x[3:length(x)], : RHS 1 is length 4 (greater than the size (2) of group 5). The last 2 element(s) will be discarded. 输出:

dput()

请随时提出问题并使用任何编码包来获得所需的输出(例如> dput(df) structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1401624180L, 1401627360L, 1401627480L, 1401628140L, 1401628800L, 1401625680L, 1401625980L, 1401626280L, 1401626520L, 1401629460L, 1401625080L, 1401627000L, 1401627120L, 1401627180L, 1401627900L, 1401629700L, 1401630300L, 1401633600L, 1401633840L, 1401634740L, 1401635040L, 1401630840L, 1401631920L, 1401630000L, 1401631260L, 1401631620L, 1401632340L, 1401633120L, 1401634140L, 1401634620L), class = c("POSIXct", "POSIXt"), tzone = ""), location = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor")), .Names = c("id", "datetime", "location" ), row.names = c(NA, -30L), class = c("data.table", "data.frame" ), sorted = c("id", "location", "datetime"), .internal.selfref = <pointer: 0x0000000000100788>) base)。谢谢你的时间!

1 个答案:

答案 0 :(得分:1)

使用来自动物园的rollapply

library(zoo)

Diff <- function(x) difftime(x[3], x[1], units = "min")
df[, diff_min := rollapply(datetime, 3, Diff, align = "left", fill = NA), 
       by = list(id, location)]