Question

我有一个数据表，我想计算以＆＃34开头的变量组的平均值;数量＆＃34;对于每个ID。

以金额开头的变量数量可能会有所不同，但在我的实际数据中它们远远超过100（并且某些变量具有NA值）。

id  variable    amountA amountB amountC amountD
1   A   8   7   6   2
2   B   6   2   1   2
3   C   6   6   9   4
4   D   1   6   2   7

在我的数据中，我尝试过不成功：

DT[,testvar := apply(DT[ ,grepl("amount",names(DT))],1,mean)]
DT[,testvar := mean(DT[ ,grepl("amount",names(DT))],na.rm=TRUE), by = idvar]

我试图用.EACHI解决这个问题，但我还没弄明白。任何想法或评论都非常感激。

样本表：

structure(list(id = 1:4, variable = structure(1:4, .Label = c("A", 
"B", "C", "D"), class = "factor"), amountA = c(8L, 6L, 6L, 1L
), amountB = c(7L, 2L, 6L, 6L), amountC = c(6L, 1L, 9L, 2L), 
    amountD = c(2L, 2L, 4L, 7L)), .Names = c("id", "variable", 
"amountA", "amountB", "amountC", "amountD"), class = "data.frame", row.names = c(NA, 
-4L))

Answer 1

这是一个可能的解决方案，采取了Arun的一些建议：

DT[, testvar:=rowMeans(.SD, na.rm=T), .SDcols=grep("^amount", names(DT), value=T)]

产地：

   id variable amountA amountB amountC amountD testvar
1:  1        A       8       7       6       2    5.75
2:  2        B       6       2       1       2    2.75
3:  3        C       6       6       9       4    6.25
4:  4        D       1       6       2       7    4.00

我们使用.SD和.SDcols定义我们希望成为内部grep对象一部分的列，然后我们只rowSums生成.SD

请注意，为了这样的东西：

DT[,testvar := apply(DT[ ,grepl("amount",names(DT))],1,mean)]

要工作，您需要使用with=FALSE标志，该标志禁用j参数的特殊评估：

DT[,testvar := apply(DT[ ,grepl("amount",names(DT)), with=F],1,mean)]

这非常接近我的答案，但是因为rowMeans非常快，所以速度较慢。

将函数应用于具有NA值的每个id的名称模式的变量

1 个答案: