我目前正在检查多个主题( id )以及他们在一段时间内访问特定位置(位置)的次数。我们正在利用简单的运动检测来增加我们的覆盖范围,而不是在视觉上识别每个主体何时到达某个位置并记录日期时间( datetime )。不幸的是,其中一些技术可以记录错误的检测结果。这会让它看起来好像是一个主体,当它确实不存在时。
为了自信地假设受试者确实访问了该位置,制造商建议每30分钟必须至少 3次录音。下面的df
data.table / data.frame就是一个例子:
> df <- data.table(df, key = c("id", "location", "datetime"))
> df
id datetime location
1: 1 2014-06-01 08:03:00 a
2: 1 2014-06-01 08:56:00 a
3: 1 2014-06-01 08:58:00 a
4: 1 2014-06-01 09:09:00 a
5: 1 2014-06-01 09:20:00 a
6: 1 2014-06-01 08:28:00 b
7: 1 2014-06-01 08:33:00 b
8: 1 2014-06-01 08:38:00 b
9: 1 2014-06-01 08:42:00 b
10: 1 2014-06-01 09:31:00 b
11: 1 2014-06-01 08:18:00 c
12: 1 2014-06-01 08:50:00 c
13: 1 2014-06-01 08:52:00 c
14: 1 2014-06-01 08:53:00 c
15: 1 2014-06-01 09:05:00 c
16: 2 2014-06-01 09:35:00 a
17: 2 2014-06-01 09:45:00 a
18: 2 2014-06-01 10:40:00 a
19: 2 2014-06-01 10:44:00 a
20: 2 2014-06-01 10:59:00 a
21: 2 2014-06-01 11:04:00 a
22: 2 2014-06-01 09:54:00 b
23: 2 2014-06-01 10:12:00 b
24: 2 2014-06-01 09:40:00 c
25: 2 2014-06-01 10:01:00 c
26: 2 2014-06-01 10:07:00 c
27: 2 2014-06-01 10:19:00 c
28: 2 2014-06-01 10:32:00 c
29: 2 2014-06-01 10:49:00 c
30: 2 2014-06-01 10:57:00 c
上面使用的密钥按主题( id ),访问过的位置(位置)以及他们访问位置的时间(日期时间)。通过以这种方式组织data.table,所有需要做的是确定3次连续录制之间的时间是否超过30分钟。我想要的输出如下:
> df
id datetime location diff_min
1: 1 2014-06-01 08:03:00 a 55
2: 1 2014-06-01 08:56:00 a 13
3: 1 2014-06-01 08:58:00 a 22
4: 1 2014-06-01 09:09:00 a NA <-----
5: 1 2014-06-01 09:20:00 a NA <-----
6: 1 2014-06-01 08:28:00 b 10
7: 1 2014-06-01 08:33:00 b 9
8: 1 2014-06-01 08:38:00 b 53
9: 1 2014-06-01 08:42:00 b NA <-----
10: 1 2014-06-01 09:31:00 b NA <-----
11: 1 2014-06-01 08:18:00 c 34
12: 1 2014-06-01 08:50:00 c 3
13: 1 2014-06-01 08:52:00 c 13
14: 1 2014-06-01 08:53:00 c NA <-----
15: 1 2014-06-01 09:05:00 c NA <-----
16: 2 2014-06-01 09:35:00 a 65
17: 2 2014-06-01 09:45:00 a 59
18: 2 2014-06-01 10:40:00 a 19
19: 2 2014-06-01 10:44:00 a 20
20: 2 2014-06-01 10:59:00 a NA <-----
21: 2 2014-06-01 11:04:00 a NA <-----
22: 2 2014-06-01 09:54:00 b NA <-----
23: 2 2014-06-01 10:12:00 b NA <-----
24: 2 2014-06-01 09:40:00 c 27
25: 2 2014-06-01 10:01:00 c 18
26: 2 2014-06-01 10:07:00 c 25
27: 2 2014-06-01 10:19:00 c 30
28: 2 2014-06-01 10:32:00 c 25
29: 2 2014-06-01 10:49:00 c NA <-----
30: 2 2014-06-01 10:57:00 c NA <-----
请注意指出<-----
值的NA
。由于我从初始值(总共3个记录)中找到difftime()
两行,因此每个 id 和 location 的最后两行/记录将为{ {1}}因为剩下的录音少于3张。记录为2或更少的任何位置都会自动获得NA
个值。
我尝试使用以下代码自行解决这个问题,但我没有接近解决它:
NA
如果您想尝试一下,请参阅下面的> df[, diff_min := lapply(.SD, function(x) c(difftime(x[3:length(x)], x[1:(length(x)-2)], units = "mins"), NA, NA)),
+ .SDcols = "datetime", by = c("id", "location")]
Warning message:
In `[.data.table`(df, , `:=`(diff_min, lapply(.SD, function(x) c(difftime(x[3:length(x)], :
RHS 1 is length 4 (greater than the size (2) of group 5). The last 2 element(s) will be discarded.
输出:
dput()
请随时提出问题并使用任何编码包来获得所需的输出(例如> dput(df)
structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1401624180L,
1401627360L, 1401627480L, 1401628140L, 1401628800L, 1401625680L,
1401625980L, 1401626280L, 1401626520L, 1401629460L, 1401625080L,
1401627000L, 1401627120L, 1401627180L, 1401627900L, 1401629700L,
1401630300L, 1401633600L, 1401633840L, 1401634740L, 1401635040L,
1401630840L, 1401631920L, 1401630000L, 1401631260L, 1401631620L,
1401632340L, 1401633120L, 1401634140L, 1401634620L), class = c("POSIXct",
"POSIXt"), tzone = ""), location = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b",
"c"), class = "factor")), .Names = c("id", "datetime", "location"
), row.names = c(NA, -30L), class = c("data.table", "data.frame"
), sorted = c("id", "location", "datetime"), .internal.selfref = <pointer: 0x0000000000100788>)
,base
)。谢谢你的时间!
答案 0 :(得分:1)
使用来自动物园的rollapply
:
library(zoo)
Diff <- function(x) difftime(x[3], x[1], units = "min")
df[, diff_min := rollapply(datetime, 3, Diff, align = "left", fill = NA),
by = list(id, location)]