data.table:计算时间移动​​窗口中行时间的统计信息

时间:2018-04-02 17:32:29

标签: r data.table

library(data.table)
library(lubridate)
df <- data.table(col1 = c('A', 'A', 'A', 'B', 'B', 'B'), col2 = c("2015-03-06 01:37:57", "2015-03-06 01:39:57", "2015-03-06 01:45:28", "2015-03-06 02:31:44", "2015-03-06 03:55:45", "2015-03-06 04:01:40"))

对于每一行,我想计算具有相同值'col1'的行的时间(col2)的标准偏差,以及在该行的时间之前过去10分钟的窗口内的时间(包括)

我使用下一种方法:

df$col2 <- as_datetime(df$col2)
gap <- 10L
df[, feat1 := .SD[.(col1 = col1, t1 = col2 - gap * 60L, t2 = col2)
                   , on = .(col1, col2 >= t1, col2 <= t2)
                   , .(sd_time = sd(as.numeric(col2))), by = .EACHI]$sd_time][]

结果我只看到NA值而不是以秒为单位的值

例如第三行(col =“A”和col2 =“2015-03-06 01:45:28”) 我已通过下一步方式手动计算:

v <- c("2015-03-06 01:37:57", "2015-03-06 01:39:57", "2015-03-06 01:45:28")
v <- as_datetime(v)
sd(v) = 233.5815

2 个答案:

答案 0 :(得分:1)

两个替代data.table解决方案(my previous answer上的变体):

# option 1
df[.(col1 = col1, t1 = col2, t2 = col2 + gap * 60L)
   , on = .(col1, col2 >= t1, col2 <= t2)
   , .(col1, col2 = x.col2, times = as.numeric(t1))
   ][, .(feat1 = sd(times))
     , by = .(col1, col2)]

# option 2
df[, feat1 := .SD[.(col1 = col1, t1 = col2, t2 = col2 + gap * 60L)
                  , on = .(col1, col2 >= t1, col2 <= t2)
                  , .(col1, col2 = x.col2, times = as.numeric(t1))
                  ][, .(sd_times = sd(times))
                    , by = .(col1, col2)]$sd_times][]

两者都给出了:

   col1                col2     feat1
1:    A 2015-03-06 00:37:57        NA
2:    A 2015-03-06 00:39:57  84.85281
3:    A 2015-03-06 00:45:28 233.58153
4:    B 2015-03-06 01:31:44        NA
5:    B 2015-03-06 02:55:45        NA
6:    B 2015-03-06 03:01:40 251.02291

答案 1 :(得分:0)

data.table解决方案:

df[,col3:=as.numeric(col2)]
df[, feat1 := {
  d <- data$col3 - col3
  sd(data$col3[col1 == data$col1 & d <= 0 & d >= -gap * 60L])
},
by = list(col3, col1)]

使用mapply循环col1,col2的所有组合的另一种方法:

df[,col3:=as.numeric(col2)]

df[, feat1:=mapply(Date = col3,ID = col1, function(Date, ID) {
  DateVect=df[col1 == ID,col3]
  d <- DateVect - Date
  sd(DateVect[d <= 0 & d >= -gap * 60L])})][]