在data.table
操作中,是否有一种很好的方法可以在分组列中创建一个子组?
我想要的结果是此输出:
dt <- data.table(
group = c("a","a","a","b","b","b","c","c"),
value = c(1,2,3,4,5,6,7,8)
)
dt[group!="a", group:="Other"][, sum(value), by=.(group)][]
给出
group V1
a 6
Other 30
但是,这会更改原始的data.table
。我不知道是否有其他方法可以合并两个data.table
。我可以想象一个更复杂的用例,其中我希望group %in% c("a","b")
作为一个子组,group %in% c("c","d")
作为另一个子组,等等。
答案 0 :(得分:0)
我认为这就像排除连接(使用术语here)的SQL权限一样
您可以按组进行分组,并且在每个组中执行反加入
#group no longer found in .SD, hence make a copy of the column
dt[, g:=group]
#go through each group, anti-join with other groups, aggregate value
dt[, .(
sumGrpVal=sum(value),
sumNonGrpVal=dt[!.SD, sum(value), on=c("group"="g")]
), by=.(group)]
或更快速的方式:
dt[, .(
sumGrpVal=sum(value),
sumNonGrpVal=dt[group!=.BY$group, sum(value)]
), by=.(group)]
输出:
group sumGrpVal sumNonGrpVal
1: a 6 30
2: b 15 21
3: c 15 21