Question

我的数据框看起来像这样：

as.data.frame(matrix(c(1,2,3,NA,4,5,NA,NA,9), nrow = 3, ncol = 3))
  V1 V2 V3
1  1 NA NA
2  2  4 NA
3  3  5  9

我想计算每列的累积平均值，它忽略了NAs，所以像这样：

  V1 V2 V3
1  1 NA NA
2  3  4 NA
3  6  9  9

我试过了：

B[!is.na(A)] <- as.data.frame(apply(B[!is.na(A)], 2, cummean))

但收到此错误消息：

dim（X）必须具有正长度

感谢您的帮助！

干杯

Answer 1

这应该有效：

A <- as.data.frame(matrix(c(1,2,3,NA,4,5,NA,NA,9), nrow = 3, ncol = 3))
B <- as.data.frame(apply(A,2,function(col){ 
                                col[!is.na(col)] <- dplyr::cummean(col[!is.na(col)])
                                return(col) 
                             }))

> B
   V1  V2 V3
1 1.0  NA NA
2 1.5 4.0 NA
3 2.0 4.5  9

Answer 2

我们可以使用data.table

library(data.table)
library(dplyr)
setDT(d1)
for(j in seq_along(d1)){
  set(d1, i = which(!is.na(d1[[j]])), j=j, value = cummean(d1[[j]][!is.na(d1[[j]])]))
}

d1
#    V1  V2 V3
#1: 1.0  NA NA
#2: 1.5 4.0 NA
#3: 2.0 4.5  9

Cummean逐栏并忽略NAs

2 个答案: