在一个df中按组类型对多个变量求和,无需子集化

时间:2017-01-20 19:28:46

标签: r

我正在寻找一种更快捷的方式来按群组类型进行汇总,对于一个df中的许多不同群体而不必进行子集。下面是一个示例数据框和我用来完成它的当前代码。这对我来说似乎很冗长,而且我猜有更快的方法可以解决这个问题。在此示例中,我的代码汇总了按名称分组的运行状况收入,然后将其合并回主服务器。我想总结健康和视觉变量,按名称分组。关键是,当变量中有1时,我只想获得健康和愿景的收入。谢谢你的帮助。

#df
name = c("jerry","jerry","jerry","dave","dave","dave","mary","mary","mary") 
health = c(1,0,1,1,0,1,0,1,1) 
vision = c(0,1,0,0,1,0,1,0,0) 
rev =c(100,200,500,1000,800,300,400,600,300)
df = data.frame(name,health,vision,rev) 


#Subset health
health = subset(df, health == 1) 


#Sum by group type
library(dplyr)
health <- health %>% group_by(name) %>% 
  mutate(
    health_rev=sum(rev, na.rm = TRUE))


#Select variables
health <- health[c("name","health_rev")]


#Remove duplicates
health <- health[!duplicated(health$name), ]


#Merge back to master
master <- merge(x = df, y = health, by = "name", all.x = TRUE)

1 个答案:

答案 0 :(得分:3)

这样的东西?

df %>% 
  group_by(name) %>% 
  mutate(health_rev = sum(rev[as.logical(health)]), 
          vision_rev = sum(rev[as.logical(vision)])) %>% 
  ungroup()

结果:

# A tibble: 9 × 6
   name health_rev vision_rev health vision   rev
  <chr>      <dbl>      <dbl>  <dbl>  <dbl> <dbl>
1  dave       1300        800      1      0  1000
2  dave       1300        800      0      1   800
3  dave       1300        800      1      0   300
4 jerry        600        200      1      0   100
5 jerry        600        200      0      1   200
6 jerry        600        200      1      0   500
7  mary        900        400      0      1   400
8  mary        900        400      1      0   600
9  mary        900        400      1      0   300