如何在R中的分组汇总中添加列总计

时间:2019-01-29 09:50:09

标签: r dplyr

我正在基于子组创建汇总表,并且希望以更整洁/更有效的方式添加总体摘要。

到目前为止,我的情况是这样。我已经通过因子变量中的级别创建了摘要。

library(tidyverse)

df <- data.frame(var1 = 10:18, 
                 var2 = c("A","B","A","B","A","B","A","B","A"))

group_summary <- df %>% group_by(var2) %>% 
                 filter(var2 != "NA") %>% 
                 summarise("Max" = max(var1, na.rm = TRUE),
                           "Median" = median(var1, na.rm = TRUE),
                           "Min" = min(var1, na.rm = TRUE),
                           "IQR" = IQR(var1, na.rm = TRUE),
                           "Count" = n())

接下来,我创建了一个总体摘要。

Summary <- df %>% 
           filter(var2 != "NA") %>% 
           summarise("Max" = max(var1, na.rm = TRUE),
           "Median" = median(var1, na.rm = TRUE),
           "Min" = min(var1, na.rm = TRUE),
           "IQR" = IQR(var1, na.rm = TRUE),
           "Count" = n())

最后,我用dplyr::bind_rows

绑定了两个对象
complete_summary <- bind_rows(Summary, group_summary)

我所做的工作很有效,但是它非常非常冗长,而且不是最有效的方法。我尝试使用ungroup

  group_summary <- df %>% group_by(var2) %>% 
                 filter(var2 != "NA") %>% 
                 summarise("Max" = max(var1, na.rm = TRUE),
                           "Median" = median(var1, na.rm = TRUE),
                           "Min" = min(var1, na.rm = TRUE),
                           "IQR" = IQR(var1, na.rm = TRUE),
                           "Count" = n()) %>% ungroup %>% 
                 summarise("Max" = max(var1, na.rm = TRUE),
                           "Median" = median(var1, na.rm = TRUE),
                           "Min" = min(var1, na.rm = TRUE),
                           "IQR" = IQR(var1, na.rm = TRUE),
                           "Count" = n())

但是它抛出了一个错误:

  Evaluation error: object var1 not found.

在此先感谢您的协助。

2 个答案:

答案 0 :(得分:0)

也不是最优雅的解决方案,而是简单的:

c <- mtcars %>%
  mutate(total_mean = mean(wt),
         total_median = median(wt)) %>%
  group_by(cyl) %>%
  summarise(meanweight = mean(wt),
            medianweight = median(wt),
            total_mean = first(total_mean),
            total_median = first(total_median)) 

答案 1 :(得分:0)

理想情况下,如果要单链执行,这是可以通过使用bind_rows合并两个结果来完成的,就像您已经做过的一样,但是要删除创建的临时对象。

library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.5.2

df <- data.frame(var1 = 10:18, 
                 var2 = c("A","B","A","B","A","B","A","B","A"))



df %>% group_by(var2) %>% 
  filter(var2 != "NA") %>% 
  summarise("Max" = max(var1, na.rm = TRUE),
            "Median" = median(var1, na.rm = TRUE),
            "Min" = min(var1, na.rm = TRUE),
            "IQR" = IQR(var1, na.rm = TRUE),
            "Count" = n()) %>% #ungroup() %>% 

  bind_rows( df %>% summarise("Max" = max(var1, na.rm = TRUE),
                    "Median" = median(var1, na.rm = TRUE),
                    "Min" = min(var1, na.rm = TRUE),
                    "IQR" = IQR(var1, na.rm = TRUE),
                    "Count" = n()))
#> # A tibble: 3 x 6
#>   var2    Max Median   Min   IQR Count
#>   <fct> <dbl>  <dbl> <dbl> <dbl> <int>
#> 1 A        18     14    10     4     5
#> 2 B        17     14    11     3     4
#> 3 <NA>     18     14    10     4     9

reprex package(v0.2.1)于2019-01-29创建