R dplyr:按组汇总所有变量的完整案例

时间:2018-06-20 08:52:05

标签: r dplyr

我想使用dplyr为数据集中的每个变量按组汇总变量。汇总的变量应以新名称存储。

一个例子:

df <- data.frame(
  group = c("A", "B", "A", "B"),
  a = c(1,1,NA,2),
  b = c(1,NA,1,1),
  c = c(1,1,2,NA),
  d = c(1,2,1,1)
)

df %>% group_by(group) %>% 
  mutate(complete_a = sum(complete.cases(a))) %>% 
  mutate(complete_b = sum(complete.cases(b))) %>%
  mutate(complete_c = sum(complete.cases(c))) %>% 
  mutate(complete_d = sum(complete.cases(d))) %>% 
  group_by(group, complete_a, complete_b, complete_c, complete_d) %>% summarise()

得出我的预期输出:

# # A tibble: 2 x 5
# # Groups:   group, complete_a, complete_b, complete_c [?]
# group complete_a complete_b complete_c complete_d
# <fct>      <int>      <int>      <int>      <int>
# A              1          2          2          2
# B              2          1          1          2

如何在不复制每个变量mutate语句的情况下生成相同的输出?

我尝试过:

df %>% group_by(group) %>% summarise_all(funs(sum(complete.cases(.))))

可以,但不能重命名变量。

1 个答案:

答案 0 :(得分:2)

您快到了。您必须使用rename_all

library(dplyr)

df %>% 
  group_by(group) %>% 
  summarise_all(funs(sum(complete.cases(.)))) %>% 
  rename_all(~paste0("complete_", colnames(df)))

# A tibble: 2 x 5
#  complete_group complete_a complete_b complete_c complete_d
#  <fct>               <int>      <int>      <int>      <int>
#1 A                       1          2          2          2
#2 B                       2          1          1          2

编辑

或者就像@symbolrush所指出的那样,更直接地没有colnames

df %>% 
  group_by(group) %>% 
  summarise_all(funs(sum(complete.cases(.)))) %>% 
  rename_all(~paste0("complete_", .))

## A tibble: 2 x 5
#  complete_group complete_a complete_b complete_c complete_d
#  <fct>               <int>      <int>      <int>      <int>
#1 A                       1          2          2          2
#2 B                       2          1          1          2