在R中使用roll_mean时如何避免回收?

时间:2017-03-08 14:14:48

标签: r dataframe lapply

我一直在尝试将滚动均值应用于数据框中的多个列,其中每列包含来自多个人的数据。我已成功使用RcppRoll包中的roll_mean并使用lapply。我在下面的例子中包含了一个使用虚拟数据帧和输出的例子。

    x <- rnorm(20,1);
    y <- rnorm(20,2);
    z <- rnorm(20,3);
    ID <- rep(1:2, each=10);

    mydf <- data.frame(ID, x, y, z);

    vars <- c("x", "y", "z");

setDT(mydf)[, paste0(vars, "_", "mean") := lapply(.SD, function(x) roll_mean(x, n=3, na.rm = TRUE)), .SDcols = vars, by = ID]

mydf

        ID           x          y         z    x_mean    y_mean   z_mean
     1:  1  0.34457704  1.9580361 2.6458335 1.2515642 1.8307447 2.569645
     2:  1  1.41839352  2.0697324 1.8495358 1.7012511 1.7248261 2.988908
     3:  1  1.99172192  1.4644657 3.2135652 1.8455087 1.7165419 3.184736
     4:  1  1.69363783  1.6402801 3.9036227 1.5002658 2.1512764 3.289555
     5:  1  1.85116646  2.0448798 2.4370206 0.9775842 3.1215589 2.563110
     6:  1  0.95599300  2.7686692 3.5280206 0.8477701 3.4576141 3.106095
     7:  1  0.12559300  4.5511275 1.7242892 0.9450234 3.5134499 3.020176
     8:  1  1.46172438  3.0530454 4.0659766 0.9080677 3.0100022 3.371839
     9:  1  1.24775283  2.9361768 3.2702614 1.2515642 1.8307447 2.569645
    10:  1  0.01472603  3.0407845 2.7792776 1.7012511 1.7248261 2.988908
    11:  2 -0.91146047  2.5898074 2.0328348 0.4314443 1.2688530 2.477879
    12:  2  0.48183559  1.8230335 2.6910075 1.2689767 0.9650435 2.544006
    13:  2  1.72395769 -0.6062819 2.7097949 0.8747931 1.2273766 1.974265
    14:  2  1.60113680  1.6783790 2.2312143 0.2579207 1.6945497 2.233321
    15:  2 -0.70071522  2.6100328 0.9817857 0.1162224 2.0928536 2.606608
    16:  2 -0.12665946  0.7952374 3.4869635 1.3884888 2.1063817 2.986786
    17:  2  1.17604187  2.8732906 3.3510742 2.0557599 2.2701173 3.178248
    18:  2  3.11608400  2.6506171 2.1223190 1.5553274 2.3987061 3.015501
    19:  2  1.87515393  1.2864441 4.0613513 0.4314443 1.2688530 2.477879
    20:  2 -0.32525560  3.2590570 2.8628313 1.2689767 0.9650435 2.544006

从输出表(mydf)可以看出,平均参数已经作为lapply语句的一部分创建,并且已经为每个单独的ID计算了滚动平均值。但是,滚动平均函数已循环结果以填充数据框,因为roll_mean函数从每个单独ID的10个原始值生成8个值。它使用回收来填充每个ID的最后两行。 我的实际数据是时间序列数据,我不希望结果被回收。我希望通过将原始x值添加到x_mean列的开头直到有足够的原始数据来产生3点滚动平均值来避免回收。

我已尝试搜索(在SO和Google上)有关避免在roll_mean或类似功能中回收的帖子,但没有成功。

有没有人知道如何在我的示例中填充前两行以避免在roll_mean函数中进行回收?

感谢。

1 个答案:

答案 0 :(得分:0)

整个解决方案:

x <- rnorm(20,1);
y <- rnorm(20,2);
z <- rnorm(20,3);
ID <- rep(1:2, each=10);

mydf <- data.table(ID, x, y, z);  # Changed to dt here

vars <- c("x", "y", "z");

# fill = NA and align = 'right'
mydf[, paste0(vars, "_", "mean") := lapply(.SD, function(x) RcppRoll::roll_mean(x, n = 3, na.rm = TRUE, fill = NA, align = 'right')), .SDcols = vars, by = ID]

mydf

#     ID          x         y        z    x_mean   y_mean   z_mean
#  1:  1  0.3735462 2.9189774 2.835476        NA       NA       NA
#  2:  1  1.1836433 2.7821363 2.746638        NA       NA       NA
#  3:  1  0.1643714 2.0745650 3.696963 0.5738536 2.591893 3.093026
#  4:  1  2.5952808 0.0106483 3.556663 1.3144318 1.622450 3.333422
#  5:  1  1.3295078 2.6198257 2.311244 1.3630533 1.568346 3.188290
# ...

mydf[is.na(x_mean), c(paste0(vars, "_", "mean")) := mget(paste0(vars))]

mydf

#     ID          x         y        z    x_mean   y_mean   z_mean
#  1:  1  0.3735462 2.9189774 2.835476 0.3735462 2.918977 2.835476
#  2:  1  1.1836433 2.7821363 2.746638 1.1836433 2.782136 2.746638
#  3:  1  0.1643714 2.0745650 3.696963 0.5738536 2.591893 3.093026
#  4:  1  2.5952808 0.0106483 3.556663 1.3144318 1.622450 3.333422
#  5:  1  1.3295078 2.6198257 2.311244 1.3630533 1.568346 3.188290
# ...

修改

mydf的遗漏部分也可以填充一点“更聪明”#34;方式,即在每次迭代中使用滚动装置,窗口小1:

for (n_inner in n_roll:1) {
  mydf[!complete.cases(mydf),
       paste0(vars, "_", "mean") := lapply(
         .SD, function(x) RcppRoll::roll_mean(x, n = n_inner, na.rm = TRUE, fill = NA, align = 'right')), .SDcols = vars, by = ID]
  }

#     ID          x         y        z    x_mean   y_mean   z_mean
#  1:  1  0.3735462 2.9189774 2.835476 0.3735462 2.918977 2.835476 <- Values from x, y and z
#  2:  1  1.1836433 2.7821363 2.746638 0.7785948 2.850557 2.791057 <- roll_mean with window 2
#  3:  1  0.1643714 2.0745650 3.696963 0.5738536 2.591893 3.093026 <- roll_mean with window 3
#  4:  1  2.5952808 0.0106483 3.556663 1.3144318 1.622450 3.333422 <- as above
#  5:  1  1.3295078 2.6198257 2.311244 1.3630533 1.568346 3.188290
# ...