R使滚动变化更加有效

时间:2018-06-28 08:34:38

标签: r data.table variance rollapply

我有一个名为features的数据表,其列为nightNo,HR,运动和角度。我想获取之前HR的600个点的滚动方差,每晚的运动和角度我想出了以下功能来做到这一点:

features <- data.table(nightNo=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2),
                       HR=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14),
                       motion=c(14,13,12,11,10,9,8,7,6,5,4,3,2,1),
                       angle=c(2,4,6,8,10,12,14,16,18,20,22,24,26,28))

# For the example I'll use a window of 6 instead of 600
window = 6
features[, c("HR_Variance", "motion_Variance", "angle_Variance") := 
       list(rollapply(HR, window, var, partial=TRUE, align = "right"), 
            rollapply(motion, window, var, partial=TRUE, align = "right"), 
            rollapply(angle, window, var, partial=TRUE, align = "right")), by=nightNo ]

#    nightNo HR motion angle HR_Variance motion_Variance angle_Variance
# 1:       1  1     14     2          NA              NA             NA
# 2:       1  2     13     4    0.500000        0.500000       2.000000
# 3:       1  3     12     6    1.000000        1.000000       4.000000
# 4:       1  4     11     8    1.666667        1.666667       6.666667
# 5:       1  5     10    10    2.500000        2.500000      10.000000
# 6:       1  6      9    12    3.500000        3.500000      14.000000
# 7:       1  7      8    14    3.500000        3.500000      14.000000
# 8:       2  8      7    16          NA              NA             NA
# 9:       2  9      6    18    0.500000        0.500000       2.000000
# 10:      2 10      5    20    1.000000        1.000000       4.000000
# 11:      2 11      4    22    1.666667        1.666667       6.666667
# 12:      2 12      3    24    2.500000        2.500000      10.000000
# 13:      2 13      2    26    3.500000        3.500000      14.000000
# 14:      2 14      1    28    3.500000        3.500000      14.000000

结果是正确的,但是由于我有一个很大的数据集,它可以永远运行。我还做了其他类似的功能,它们在每晚相同的600个窗口上使用runmeans和sapplys否,它们在合理的时间内运行,这使我认为rollapply或方差函数非常慢。有没有办法通过更改var或rollapply函数来使此代码更高效?

1 个答案:

答案 0 :(得分:0)

我不知道rollaplly在做什么,但是我使用并行的tidyverse在给定的样本数据上生成此输出,

library(cumstats)
library(tidyverse)
library(furrr)

plan(multiprocess)
window <- 6

features %>% 
  nest(-nightNo) %>% 
  mutate(data=future_map(data,~mutate_at(.,vars(HR, motion,angle), 
                funs(var=cumvar(.)[c(1:window,rep(window,length(.)-length(1:window)))])))) %>% 
  unnest()
# A tibble: 14 x 7
   nightNo    HR motion angle HR_var motion_var angle_var
     <dbl> <dbl>  <dbl> <dbl>  <dbl>      <dbl>     <dbl>
 1       1     1     14     2  NA         NA        NA   
 2       1     2     13     4   0.5        0.5       2   
 3       1     3     12     6   1          1         4   
 4       1     4     11     8   1.67       1.67      6.67
 5       1     5     10    10   2.5        2.5      10   
 6       1     6      9    12   3.5        3.5      14   
 7       1     7      8    14   3.5        3.5      14   
 8       2     8      7    16  NA         NA        NA   
 9       2     9      6    18   0.5        0.5       2   
10       2    10      5    20   1          1         4   
11       2    11      4    22   1.67       1.67      6.67
12       2    12      3    24   2.5        2.5      10   
13       2    13      2    26   3.5        3.5      14   
14       2    14      1    28   3.5        3.5      14