我正在使用dplyr的group_by并汇总得到每个group_by变量组合的均值,但也希望通过每个group_by变量得到均值。
例如,如果我运行
mtcars %>%
group_by(cyl, vs) %>%
summarise(new = mean(wt))
我得到了
cyl vs new
<dbl> <dbl> <dbl>
4 0 2.140000
4 1 2.300300
6 0 2.755000
6 1 3.388750
8 0 3.999214
但我想得到
cyl vs new
<dbl> <dbl> <dbl>
4 0 2.140000
4 1 2.300300
4 NA 2.285727
6 0 2.755000
6 1 3.388750
6 NA 3.117143
8 0 3.999214
NA 0 3.688556
NA 1 2.611286
即。得到组合和单独变量的均值
修改
Jaap将此标记为重复,并将我指向Using aggregate to apply several functions on several variables in one call的方向。我看了jaap在那里引用了dplyr的答案,但我看不出这是如何回答我的问题的?你说使用summarise_each
,但我仍然不知道如何使用它来逐个变量来获取每个组的平均值?抱歉,如果我是愚蠢的......
答案 0 :(得分:1)
以下是使用bind_rows
,
library(dplyr)
mtcars %>%
group_by(cyl, vs) %>%
summarise(new = mean(wt)) %>%
bind_rows(.,
mtcars %>% group_by(cyl) %>% summarise(new = mean(wt)) %>% mutate(vs = NA),
mtcars %>% group_by(vs) %>% summarise(new = mean(wt)) %>% mutate(cyl = NA)) %>%
arrange(cyl) %>%
ungroup()
# A tibble: 10 × 3
# cyl vs new
# <dbl> <dbl> <dbl>
#1 4 0 2.140000
#2 4 1 2.300300
#3 4 NA 2.285727
#4 6 0 2.755000
#5 6 1 3.388750
#6 6 NA 3.117143
#7 8 0 3.999214
#8 8 NA 3.999214
#9 NA 0 3.688556
#10 NA 1 2.611286