如何轻松组合分组汇总的输出和数据的整体输出

时间:2019-07-29 22:26:03

标签: r dplyr

我在group_by中使用summarisedplyr命令来为我的数据生成一些摘要。我想为整体数据集获得相同的摘要,并将其合并为一个tibble

是否有一种简便的方法?我下面的解决方案感觉像是有效执行此操作所需代码量的四倍!

谢谢。

# reprex

library(tidyverse)

tidy_data <- tibble::tribble(
        ~drug, ~gender, ~condition, ~value,
    "control",     "f",     "work",   0.06,
  "treatment",     "m",     "work",   0.42,
  "treatment",     "f",     "work",   0.22,
    "control",     "m",     "work",   0.38,
  "treatment",     "m",     "work",   0.57,
  "treatment",     "f",     "work",   0.24,
    "control",     "f",     "work",   0.61,
    "control",     "f",     "play",   0.27,
  "treatment",     "m",     "play",    0.3,
  "treatment",     "f",     "play",   0.09,
    "control",     "m",     "play",   0.84,
    "control",     "m",     "play",   0.65,
  "treatment",     "m",     "play",   0.98,
  "treatment",     "f",     "play",   0.38
  )

tidy_summaries <- tidy_data %>%

  # Group by the required variables
  group_by(drug, gender, condition) %>% 

  summarise(mean = mean(value),
            median = median(value),
            min = min(value),
            max = max(value)) %>%

  # Bind rows will bind this output to the following one
  bind_rows(

    # Now for the overall version
    tidy_data %>%

      # Generate the overall summary values
      mutate(mean = mean(value),
             median = median(value),
             min = min(value),
             max = max(value)) %>%

      # We need to know what the structure of the 'grouped_by' tibble first
      # as the overall output format needs to match that
      select(drug, gender, condition, mean:max) %>% # Keep columns of interest

      # The same information will be appended to all rows, so we just need to retain one
      filter(row_number() == 1) %>% 

      # Change the values in drug, gender, condition to "overall"
      mutate_at(vars(drug:condition), 
                list(~ifelse(is.character(.), "overall", .)))
      ) 

这是我想要的输出,但是没有我希望的那么简单。

tidy_summaries
#> # A tibble: 9 x 7
#> # Groups:   drug, gender [5]
#>   drug      gender  condition  mean median   min   max
#>   <chr>     <chr>   <chr>     <dbl>  <dbl> <dbl> <dbl>
#> 1 control   f       play      0.27   0.27   0.27 0.27 
#> 2 control   f       work      0.335  0.335  0.06 0.61 
#> 3 control   m       play      0.745  0.745  0.65 0.84 
#> 4 control   m       work      0.38   0.38   0.38 0.38 
#> 5 treatment f       play      0.235  0.235  0.09 0.38 
#> 6 treatment f       work      0.23   0.23   0.22 0.24 
#> 7 treatment m       play      0.64   0.64   0.3  0.98 
#> 8 treatment m       work      0.495  0.495  0.42 0.570
#> 9 overall   overall overall   0.429  0.38   0.06 0.98

2 个答案:

答案 0 :(得分:0)

尝试

tidy_data %>% 
  group_by(drug, gender, condition) %>% 
  summarise(mean = mean(value), median = median(value), min = min(value), max = max(value)) %>%
  bind_rows(.,
            tidy_data %>%
              summarise(drug = "Overall", gender = "Overall", condition = "Overall", mean = mean(value), median = median(value), min = min(value), max = max(value))
  )

这给出了:

# A tibble: 9 x 7
# Groups:   drug, gender [5]
  drug      gender  condition  mean median   min   max
  <chr>     <chr>   <chr>     <dbl>  <dbl> <dbl> <dbl>
1 control   f       play      0.27   0.27   0.27 0.27 
2 control   f       work      0.335  0.335  0.06 0.61 
3 control   m       play      0.745  0.745  0.65 0.84 
4 control   m       work      0.38   0.38   0.38 0.38 
5 treatment f       play      0.235  0.235  0.09 0.38 
6 treatment f       work      0.23   0.23   0.22 0.24 
7 treatment m       play      0.64   0.64   0.3  0.98 
8 treatment m       work      0.495  0.495  0.42 0.570
9 Overall   Overall Overall   0.429  0.38   0.06 0.98 

代码首先通过分组对其进行汇总,然后根据原始数据创建最终的摘要行,并将其绑定在最底部。

答案 1 :(得分:0)

有趣的问题。我的回答基本上与@sumshyftw相同,但是使用了mutate_ifsummarise_at

代码

library(hablar)

funs <- list(mean   = ~mean(.), 
             median = ~median(.), 
             min    = ~min(.), 
             max    = ~max(.))

tidy_data %>% 
  group_by(drug, gender, condition) %>% 
  summarise_at(vars(value), funs) %>% 
  ungroup() %>% 
  bind_rows(., tidy_data %>% summarise_at(vars(value), funs)) %>% 
  mutate_if(is.character, ~if_na(., "Overall"))

结果

  drug      gender  condition  mean median   min   max
  <chr>     <chr>   <chr>     <dbl>  <dbl> <dbl> <dbl>
1 control   f       play      0.27   0.27   0.27 0.27 
2 control   f       work      0.335  0.335  0.06 0.61 
3 control   m       play      0.745  0.745  0.65 0.84 
4 control   m       work      0.38   0.38   0.38 0.38 
5 treatment f       play      0.235  0.235  0.09 0.38 
6 treatment f       work      0.23   0.23   0.22 0.24 
7 treatment m       play      0.64   0.64   0.3  0.98 
8 treatment m       work      0.495  0.495  0.42 0.570
9 Overall   Overall Overall   0.429  0.38   0.06 0.98