我有一个data.table,如下所示:
a <- data.table(color=c("Red","Blue","Red","Green","Red","Blue","Blue"), count=c(1,2,6,4,2,1,1),include=c(1,1,1,1,0,0,1))
> a
color count include
[1,] Red 1 1
[2,] Blue 2 1
[3,] Red 6 1
[4,] Green 4 1
[5,] Red 2 0
[6,] Blue 1 0
[7,] Blue 1 1
我希望创建一个新的data.table,它只有唯一的颜色值,并且每个匹配的count列的总和包括= 1,如下所示:
colour total
[1,] Red 7
[2,] Blue 2
[3,] Green 4
我尝试了以下内容,过去我取得了一些成功:
> a[,include == 1,list(total=sum(count)),by=colour]
Error in `[.data.table`(a, , include == 1, list(quantity = sum(count)), :
Provide either 'by' or 'keyby' but not both
当a
没有密钥且密钥为colour
时,会收到同样的错误消息。我还尝试将密钥设置为colour
,以下内容:
> a[,include == 1,list(quantity=sum(count))]
Error in `[.data.table`(a, , include == 1, list(quantity = sum(count))) :
Each item in the 'by' or 'keyby' list must be same length as rows in x (7): 1
我找不到任何其他好的解决方案。任何帮助非常感谢。
答案 0 :(得分:3)
这应该有效
library(data.table)
a <- data.table(color=c("Red","Blue","Red","Green","Red","Blue","Blue"), count=c(1,2,6,4,2,1,1),include=c(1,1,1,1,0,0,1))
a[include == 1, list(total=sum(count)), keyby = color]
color total
1: Blue 3
2: Green 4
3: Red 7
马修编辑:
或者,如果include
仅使用值0
和1
,那么:
a[, list(total=sum(count*include)), keyby = color]
或如果include
包含其他值,则:
a[, list(total=sum(count*(include==1))), keyby = color]
可能需要考虑NA
。
通过避免矢量扫描i
可能会提高效率,但这很大程度上取决于数据大小和属性。这些只需要与最大组一样大的工作内存,而include==1
中的i
至少需要分配一个向量nrow(a)
。