在R中的循环中构建表达式列表

时间:2014-02-07 01:43:18

标签: r data.table

我正在尝试聚合一些数据,我想取一些变量的比例,并将它们的值设置为如下所示的列

library(data.table)
testDT <- data.table(z=sample(1:5, 2500000, replace=TRUE), a=sample(1:20, 2500000, replace=TRUE), b=sample(1:30, 2500000, replace=TRUE), c=sample(1:10, 2500000, replace=TRUE))
setkey(testDT, z)
testDT.AG=testDT[, list(
                    a_Mean=mean(as.numeric(a), na.rm = TRUE),
                    a_1_prop=length(which(a==1))/length(which(a>0)),
                    a_2_prop=length(which(a==2))/length(which(a>0)),
                    a_3_prop=length(which(a==3))/length(which(a>0)),
                    a_4_prop=length(which(a==4))/length(which(a>0)),
                    a_5_prop=length(which(a==5))/length(which(a>0)),
                    a_6_prop=length(which(a==6))/length(which(a>0)),
                    a_7_prop=length(which(a==7))/length(which(a>0)),
                    a_8_prop=length(which(a==8))/length(which(a>0)),
                    a_9_prop=length(which(a==9))/length(which(a>0)),
                    a_10_prop=length(which(a==10))/length(which(a>0))
                ), by=list(z)]

我想用下面的循环构建这个列表:

testDT.AG=testDT[, list(
                        a_Mean=mean(as.numeric(a), na.rm = TRUE),
                        for (i in c(1:10))
                        {
                        assign(paste("a_", i, "_prop"), length(which(a==i))/length(which(a>0))),
                        }
                    ), by=list(z)]

但这不起作用......

无论如何都要在循环中构建这样的表达式列表吗?

提前谢谢!

1 个答案:

答案 0 :(得分:1)

我让你的例子稍微小一些,但是你应该能够毫不费力地扩展它:

testDT <- data.table(z=sample(1:5, 2500, replace=TRUE), a=sample(1:20, 2500, replace=TRUE), b=sample(1:10, 2500, replace=TRUE), c=sample(1:10, 2500, replace=TRUE))
setkey(testDT, z)
prct.i <- function(a,i) sum(a==i)/sum(a>0)
testDT[  , setNames( lapply(1:3, prct.i, a=a), paste0("a_", 1:3, "_prop") ), by=z]

   z   a_1_prop   a_2_prop   a_3_prop
1: 1 0.04373757 0.04970179 0.05964215
2: 2 0.04678363 0.01949318 0.04483431
3: 3 0.04158416 0.06534653 0.05742574
4: 4 0.05296610 0.04872881 0.05084746
5: 5 0.05128205 0.04142012 0.04930966

两个“技巧”:使用lapply返回列表,使用setNames命名未命名的列表。不幸的是,对于函数式语言来说有点讽刺,在R for for循环中总是返​​回NULL。我后来意识到我需要添加手段:

testDT[  , c(a_Mean=mean(as.numeric(a), na.rm = TRUE), 
             setNames( lapply(1:3, prct.i, a=a), paste0("a_", 1:3, "_prop") )
             ), by=z]
   z   a_Mean   a_1_prop   a_2_prop   a_3_prop
1: 1 10.62227 0.04373757 0.04970179 0.05964215
2: 2 10.93762 0.04678363 0.01949318 0.04483431
3: 3 10.50495 0.04158416 0.06534653 0.05742574
4: 4 10.64619 0.05296610 0.04872881 0.05084746
5: 5 10.75937 0.05128205 0.04142012 0.04930966

我根据原始代码的缩短版和更高效版检查了这些值:

testDT[, list(
                     a_Mean=mean(as.numeric(a), na.rm = TRUE),
                     a_1_prop=sum(a==1)/sum(a>0),
                     a_2_prop=sum(a==2)/sum(a>0),
                     a_3_prop=sum(a==3)/sum(a>0)
                 ), by=list(z)]
   z   a_Mean   a_1_prop   a_2_prop   a_3_prop
1: 1 10.62227 0.04373757 0.04970179 0.05964215
2: 2 10.93762 0.04678363 0.01949318 0.04483431
3: 3 10.50495 0.04158416 0.06534653 0.05742574
4: 4 10.64619 0.05296610 0.04872881 0.05084746
5: 5 10.75937 0.05128205 0.04142012 0.04930966