带有多个条件的data.table聚合BY语句

时间:2016-10-04 01:54:55

标签: r data.table aggregate

这里有一些播放数据

df = data.frame(ID = c(1,1,1,2,2,2,2,3,3),
                food = c("bacon","bacon","bacon","bacon","bacon","cheese","sausage","avocado","ham"),
                enjoyment = c(20,20,20,20,20,20,20,20,20))

导致

 ID    food enjoyment
1  1   bacon        20
2  1   bacon        20
3  1   bacon        20
4  2   bacon        20
5  2   bacon        20
6  2  cheese        20
7  2 sausage        20
8  3 avocado        20
9  3     ham        20

我想做的是,对于每个人(身份证),总结他们对培根和奶酪的享受

到目前为止,我的代码是

library(data.table)
setDT(df)
df[,id_enjoyment_sum := sum(enjoyment), by =.(ID,food == "bacon"|food == "cheese")]

导致

 ID    food enjoyment id_enjoyment_sum
1:  1   bacon        20               60
2:  1   bacon        20               60
3:  1   bacon        20               60
4:  2   bacon        20               60
5:  2   bacon        20               60
6:  2  cheese        20               60
7:  2 sausage        20               20
8:  3 avocado        20               40
9:  3     ham        20               40

这已经完成了我想要它做的事情,但它也总结了每个人,他们享受非培根和非奶酪食品。请注意,ID 3不会吃培根或奶酪,但我的代码仍然总结了他对他吃的东西的享受。

理想情况下,代码应该导致

ID    food enjoyment id_enjoyment_sum
1:  1   bacon        20               60
2:  1   bacon        20               60
3:  1   bacon        20               60
4:  2   bacon        20               60
5:  2   bacon        20               60
6:  2  cheese        20               60
7:  2 sausage        20               60
8:  3 avocado        20               0
9:  3     ham        20               0

所以我的问题是,我如何设置BY子句来总结,对于每个id,只有培根和奶酪的享受?

1 个答案:

答案 0 :(得分:3)

在一个班轮里,我会这样做:

df[,
    id_enjoyment_sum := sum(
        ifelse(food %in% c("bacon", "cheese"), enjoyment, 0)
    )
    , by =.(ID)]

如果覆盖享受栏不是问题,你可以考虑这个:

df[! food %in% c("bacon", "cheese"), enjoyment := 0]
df[, id_enjoyment_sum := sum(enjoyment), by = .(ID)]

当您按多个变量分组时,每个组合都会有一组,并且聚合将在这些组中发生。所以在你的情况下,

有一组行
  1. ID == 1 and (food == "bacon"|food == "cheese") == TRUE
  2. ID == 2 and (food == "bacon"|food == "cheese") == TRUE
  3. ID == 2 and (food == "bacon"|food == "cheese") == FALSE等等。