我正在寻找一种更快捷的方式来按群组类型进行汇总,对于一个df中的许多不同群体而不必进行子集。下面是一个示例数据框和我用来完成它的当前代码。这对我来说似乎很冗长,而且我猜有更快的方法可以解决这个问题。在此示例中,我的代码汇总了按名称分组的运行状况收入,然后将其合并回主服务器。我想总结健康和视觉变量,按名称分组。关键是,当变量中有1时,我只想获得健康和愿景的收入。谢谢你的帮助。
#df
name = c("jerry","jerry","jerry","dave","dave","dave","mary","mary","mary")
health = c(1,0,1,1,0,1,0,1,1)
vision = c(0,1,0,0,1,0,1,0,0)
rev =c(100,200,500,1000,800,300,400,600,300)
df = data.frame(name,health,vision,rev)
#Subset health
health = subset(df, health == 1)
#Sum by group type
library(dplyr)
health <- health %>% group_by(name) %>%
mutate(
health_rev=sum(rev, na.rm = TRUE))
#Select variables
health <- health[c("name","health_rev")]
#Remove duplicates
health <- health[!duplicated(health$name), ]
#Merge back to master
master <- merge(x = df, y = health, by = "name", all.x = TRUE)
答案 0 :(得分:3)
这样的东西?
df %>%
group_by(name) %>%
mutate(health_rev = sum(rev[as.logical(health)]),
vision_rev = sum(rev[as.logical(vision)])) %>%
ungroup()
结果:
# A tibble: 9 × 6
name health_rev vision_rev health vision rev
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 dave 1300 800 1 0 1000
2 dave 1300 800 0 1 800
3 dave 1300 800 1 0 300
4 jerry 600 200 1 0 100
5 jerry 600 200 0 1 200
6 jerry 600 200 1 0 500
7 mary 900 400 0 1 400
8 mary 900 400 1 0 600
9 mary 900 400 1 0 300