对满足所有可能标准的标准的所有值求和

时间:2012-08-13 13:53:45

标签: r data.table

我有一个data.table,如下所示:

a <- data.table(color=c("Red","Blue","Red","Green","Red","Blue","Blue"), count=c(1,2,6,4,2,1,1),include=c(1,1,1,1,0,0,1))

> a
     color count include
[1,]   Red     1       1
[2,]  Blue     2       1
[3,]   Red     6       1
[4,] Green     4       1
[5,]   Red     2       0
[6,]  Blue     1       0
[7,]  Blue     1       1

我希望创建一个新的data.table,它只有唯一的颜色值,并且每个匹配的count列的总和包括= 1,如下所示:

     colour total
[1,]   Red     7
[2,]  Blue     2
[3,] Green     4  

我尝试了以下内容,过去我取得了一些成功:

> a[,include == 1,list(total=sum(count)),by=colour]
Error in `[.data.table`(a, , include == 1, list(quantity = sum(count)),  : 
  Provide either 'by' or 'keyby' but not both

a没有密钥且密钥为colour时,会收到同样的错误消息。我还尝试将密钥设置为colour,以下内容:

> a[,include == 1,list(quantity=sum(count))]
Error in `[.data.table`(a, , include == 1, list(quantity = sum(count))) : 
  Each item in the 'by' or 'keyby' list must be same length as rows in x (7): 1

我找不到任何其他好的解决方案。任何帮助非常感谢。

1 个答案:

答案 0 :(得分:3)

这应该有效

library(data.table)
a <- data.table(color=c("Red","Blue","Red","Green","Red","Blue","Blue"), count=c(1,2,6,4,2,1,1),include=c(1,1,1,1,0,0,1))
a[include == 1, list(total=sum(count)), keyby = color]

   color total
1:  Blue     3
2: Green     4
3:   Red     7

马修编辑:

或者,如果include仅使用值01,那么:

a[, list(total=sum(count*include)), keyby = color]

或如果include包含其他值,则:

a[, list(total=sum(count*(include==1))), keyby = color]

可能需要考虑NA

通过避免矢量扫描i可能会提高效率,但这很大程度上取决于数据大小和属性。这些只需要与最大组一样大的工作内存,而include==1中的i至少需要分配一个向量nrow(a)