这里有一些播放数据
df = data.frame(ID = c(1,1,1,2,2,2,2,3,3),
food = c("bacon","bacon","bacon","bacon","bacon","cheese","sausage","avocado","ham"),
enjoyment = c(20,20,20,20,20,20,20,20,20))
导致
ID food enjoyment
1 1 bacon 20
2 1 bacon 20
3 1 bacon 20
4 2 bacon 20
5 2 bacon 20
6 2 cheese 20
7 2 sausage 20
8 3 avocado 20
9 3 ham 20
我想做的是,对于每个人(身份证),总结他们对培根和奶酪的享受
到目前为止,我的代码是
library(data.table)
setDT(df)
df[,id_enjoyment_sum := sum(enjoyment), by =.(ID,food == "bacon"|food == "cheese")]
导致
ID food enjoyment id_enjoyment_sum
1: 1 bacon 20 60
2: 1 bacon 20 60
3: 1 bacon 20 60
4: 2 bacon 20 60
5: 2 bacon 20 60
6: 2 cheese 20 60
7: 2 sausage 20 20
8: 3 avocado 20 40
9: 3 ham 20 40
这已经完成了我想要它做的事情,但它也总结了每个人,他们享受非培根和非奶酪食品。请注意,ID 3不会吃培根或奶酪,但我的代码仍然总结了他对他吃的东西的享受。
理想情况下,代码应该导致
ID food enjoyment id_enjoyment_sum
1: 1 bacon 20 60
2: 1 bacon 20 60
3: 1 bacon 20 60
4: 2 bacon 20 60
5: 2 bacon 20 60
6: 2 cheese 20 60
7: 2 sausage 20 60
8: 3 avocado 20 0
9: 3 ham 20 0
所以我的问题是,我如何设置BY子句来总结,对于每个id,只有培根和奶酪的享受?
答案 0 :(得分:3)
在一个班轮里,我会这样做:
df[,
id_enjoyment_sum := sum(
ifelse(food %in% c("bacon", "cheese"), enjoyment, 0)
)
, by =.(ID)]
如果覆盖享受栏不是问题,你可以考虑这个:
df[! food %in% c("bacon", "cheese"), enjoyment := 0]
df[, id_enjoyment_sum := sum(enjoyment), by = .(ID)]
当您按多个变量分组时,每个组合都会有一组,并且聚合将在这些组中发生。所以在你的情况下,
有一组行ID == 1 and (food == "bacon"|food == "cheese") == TRUE
,ID == 2 and (food == "bacon"|food == "cheese") == TRUE
,ID == 2 and (food == "bacon"|food == "cheese") == FALSE
等等。