Question

在data.table操作中，是否有一种很好的方法可以在分组列中创建一个子组？

我想要的结果是此输出：

dt <- data.table(
  group = c("a","a","a","b","b","b","c","c"),
  value = c(1,2,3,4,5,6,7,8)
)

dt[group!="a", group:="Other"][, sum(value), by=.(group)][]

给出

group V1
a     6
Other 30

但是，这会更改原始的data.table。我不知道是否有其他方法可以合并两个data.table。我可以想象一个更复杂的用例，其中我希望group %in% c("a","b")作为一个子组，group %in% c("c","d")作为另一个子组，等等。

Answer 1

我认为这就像排除连接（使用术语here）的SQL权限一样

您可以按组进行分组，并且在每个组中执行反加入

#group no longer found in .SD, hence make a copy of the column
dt[, g:=group]

#go through each group, anti-join with other groups, aggregate value
dt[, .(
        sumGrpVal=sum(value), 
        sumNonGrpVal=dt[!.SD, sum(value), on=c("group"="g")]
    ), by=.(group)]

或更快速的方式：

dt[, .(
    sumGrpVal=sum(value), 
    sumNonGrpVal=dt[group!=.BY$group, sum(value)]
), by=.(group)]

输出：

   group sumGrpVal sumNonGrpVal
1:     a         6           30
2:     b        15           21
3:     c        15           21

R data.table-部分聚集在组中并执行操作

1 个答案: