总结多个group_by变量的组合和单独

时间:2017-01-20 12:20:22

标签: r dplyr

我正在使用dplyr的group_by并汇总得到每个group_by变量组合的均值,但也希望通过每个group_by变量得到均值。

例如,如果我运行

mtcars %>% 
  group_by(cyl, vs) %>% 
  summarise(new = mean(wt))

我得到了

    cyl    vs      new
  <dbl> <dbl>    <dbl>
     4     0 2.140000
     4     1 2.300300
     6     0 2.755000
     6     1 3.388750
     8     0 3.999214

但我想得到

    cyl    vs      new
  <dbl> <dbl>    <dbl>
     4     0 2.140000
     4     1 2.300300
     4    NA 2.285727
     6     0 2.755000
     6     1 3.388750
     6    NA 3.117143
     8     0 3.999214
    NA     0 3.688556
    NA     1 2.611286

即。得到组合和单独变量的均值

修改 Jaap将此标记为重复,并将我指向Using aggregate to apply several functions on several variables in one call的方向。我看了jaap在那里引用了dplyr的答案,但我看不出这是如何回答我的问题的?你说使用summarise_each,但我仍然不知道如何使用它来逐个变量来获取每个组的平均值?抱歉,如果我是愚蠢的......

1 个答案:

答案 0 :(得分:1)

以下是使用bind_rows

的想法
library(dplyr)

mtcars %>% 
     group_by(cyl, vs) %>% 
     summarise(new = mean(wt)) %>% 
    bind_rows(., 
              mtcars %>% group_by(cyl) %>% summarise(new = mean(wt)) %>% mutate(vs = NA), 
              mtcars %>% group_by(vs) %>% summarise(new = mean(wt)) %>% mutate(cyl = NA)) %>% 
   arrange(cyl) %>% 
   ungroup()

# A tibble: 10 × 3
#     cyl    vs      new
#   <dbl> <dbl>    <dbl>
#1      4     0 2.140000
#2      4     1 2.300300
#3      4    NA 2.285727
#4      6     0 2.755000
#5      6     1 3.388750
#6      6    NA 3.117143
#7      8     0 3.999214
#8      8    NA 3.999214
#9     NA     0 3.688556
#10    NA     1 2.611286