如何根据条件在多个列上使用aggregate()

时间:2018-07-25 14:07:14

标签: r data.table aggregate subset

我想使用Type RT LT NAID RecordTime "T" "71" "603" "NZ45" "2018-05-30 16:59:00" "S" "34" "210" "NZ45" "2018-05-30 16:59:00" "T" "90" "480" "CR98" "2018-05-30 16:59:00" "S" "80" "180" "RU992" "2018-05-30 16:58:00"根据某些条件(例如,仅适用于aggregate的那些行。在使用> 0之前subset为值> 0设置数据显然是行不通的,因为这将删除所有列的整个行,即使仅出现一个零。请参见以下代码以获取说明:

aggregate

当然,这种形式的idA <- c("A","A","A","A","A","B","B","B","B","B") idB <- c("C","D","C","D","C","D","C","D","C","D") colA <- c(0,2,3,0,0,3,9,5,6,1) colB <- c(9,3,0,2,2,4,6,1,9,9) colC <- c(0,0,5,7,3,9,8,1,2,3) df <- data.frame(idA,idB,colA,colB,colC) aggregate(.~idA+idB,df,FUN=NROW) 命令毫无意义,因为所有列的行数均相等。

这是我要寻找的结果:

aggregate

因此需要一个条件语句,仅包含行idA idB colA colB colC A C 1 2 2 B C 2 2 2 A D 1 2 1 B D 3 3 3 。另外,我敢肯定有一种通过> 0进行此操作的聪明方法。任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:1)

在data.table中,您可以执行以下操作:

setDT(df)
df[,lapply(.SD, function(x) sum(x > 0)),.(idA,idB), .SDcols = setdiff(names(df), c('idA','idB'))]