Question

我正在尝试计算一些时间序列值的移动总和。但是，数据量巨大。我不确定实际上最快的方法是什么。

这是我尝试过的： 1.使用data.tables和filter 2. sapplying，可以使用foreach包进行并行化。但我认为应该有一个更简洁的方法来做到这一点

以下是代码示例：

set.seed(12345)
library(dplyr)
library(data.table)

# Generate random data
ts = seq(from = as.POSIXct(1447155253, origin = "1970-1-1"), to =     as.POSIXct(1447265253, origin = "1970-1-1"), by ="min")
value = sample(1:10, length(ts), replace = T)
sampleDF = data.frame(timestamp = ts, value = value )
sampleDF = as.data.table(sampleDF)


# Pre-manipulations 
slidingwindow = 5*60 # 5 minutes window
end.ts = sampleDF$timestamp[length(sampleDF$timestamp)] - slidingwindow 
end.i = which(sampleDF$timestamp >= end.ts)[1] 


# Apply rolling sum 

system.time(
  sapply( 1:end.i,         
        FUN = function(i) { 
          from = sampleDF$timestamp[i] # starting point
          to = from + slidingwindow # ending point 
          sum = filter(sampleDF, timestamp >= from, timestamp < to) %>% .$value %>% sum   # Filter and sum     
          return( sum)
        })
)

# user  system elapsed 
# 5.60    0.00    5.69

您的建议将不胜感激： - ）

最有效的滚动总和方式

0 个答案: