library(data.table)
library(lubridate)
df <- data.table(col1 = c('A', 'A', 'A', 'B', 'B', 'B'), col2 = c("2015-03-06 01:37:57", "2015-03-06 01:39:57", "2015-03-06 01:45:28", "2015-03-06 02:31:44", "2015-03-06 03:55:45", "2015-03-06 04:01:40"))
对于每一行,我想计算具有相同值'col1'的行的时间标准偏差(col2)和在该行(包括)的时间之前的过去10分钟内的窗口内的时间以及此时间之后的下一个10分钟行(包括)
我尝试使用基于previous question
解决方案的快速方法df$col2 <- as_datetime(df$col2)
gap <- 10L
df[, feat1 := .SD[.(col1 = col1, t1 = col2 - gap * 60L, t2 = col2 + gap * 60L)
, on = .(col1, col2 >= t1, col2 <= t2)
, .(col1, col2 = x.col2, times = as.numeric(col2))
][, .(sd_times = sd(times))
, by = .(col1, col2)]$sd_times][]
但我有下一个错误:
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
Join results in 14 rows; more than 12 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
答案 0 :(得分:0)
我已使用上面的Frank评论解决了我的任务:
df[, feat1 := .SD[.(col1 = col1, t1 = col2 - gap * 60L, t2 = col2 + gap * 60L)
, on = .(col1, col2 >= t1, col2 <= t2)
, .(col1, col2 = x.col2, times = as.numeric(col2)), allow.cartesian=TRUE
][, .(sd_times = sd(times))
, by = .(col1, col2)]$sd_times][]