如何使用列名列表进行分组和汇总?

时间:2019-08-30 17:37:15

标签: r

基本上,我想循环遍历,按“ list.group”中的列对数据进行分组,然后为“ list.avg”,“ list.max”和“ list.min”中的每一列创建摘要统计信息”,以使列为mpg_avg,wt_avg,hp_avg,mpg_max,hp_max ... mpg_min,hp_min等。

data("mtcars")
    list.avg <- list("mpg","wt","hp")
    list.max <- list("mpg","hp","wt","qsec")
    list.min <- list("mpg","hp","wt","qsec")
    list.group <- list("cyl","vs","am","gear","carb")

所以我应该为list.group中的每一列有一个单独的表。

3 个答案:

答案 0 :(得分:3)

首先,将所有avg / max / min变量放在一个列表中会很有帮助。

to_summarise <- 
  list(mean = c("mpg","wt","hp"),
       max = c("mpg","hp","wt","qsec"),
       min = c("mpg","hp","wt","qsec"))

现在我们可以map超过list.group,并且在每个list.group值内,imap超过to_summarise,然后merge将所有结果加在一起

library(tidyverse)

map(list.group, ~{
  grouped <- 
    mtcars %>% 
      group_by_at(.x) 
  out <- 
    imap(to_summarise, ~{
            grouped %>% 
              summarise_at(.x, setNames(list(get(.y)), .y))
    })
  out %>% 
    reduce(merge, by = .x)
})

输出

#     [[1]]
#   cyl mpg_mean  wt_mean   hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1   4 26.66364 2.285727  82.63636    33.9    113  3.190    22.90    21.4     52  1.513
# 2   6 19.74286 3.117143 122.28571    21.4    175  3.460    20.22    17.8    105  2.620
# 3   8 15.10000 3.999214 209.21429    19.2    335  5.424    18.00    10.4    150  3.170
#   qsec_min
# 1     16.7
# 2     15.5
# 3     14.5
# 
# [[2]]
#   vs mpg_mean  wt_mean   hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1  0 16.61667 3.688556 189.72222    26.0    335  5.424     18.0    10.4     91  2.140
# 2  1 24.55714 2.611286  91.35714    33.9    123  3.460     22.9    17.8     52  1.513
#   qsec_min
# 1     14.5
# 2     16.9
# 
# [[3]]
#   am mpg_mean  wt_mean  hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1  0 17.14737 3.768895 160.2632    24.4    245  5.424     22.9    10.4     62  2.465
# 2  1 24.39231 2.411000 126.8462    33.9    335  3.570     19.9    15.0     52  1.513
#   qsec_min
# 1    15.41
# 2    14.50
# 
# [[4]]
#   gear mpg_mean  wt_mean  hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1    3 16.10667 3.892600 176.1333    21.5    245  5.424    20.22    10.4     97  2.465
# 2    4 24.53333 2.616667  89.5000    33.9    123  3.440    22.90    17.8     52  1.615
# 3    5 21.38000 2.632600 195.6000    30.4    335  3.570    16.90    15.0     91  1.513
#   qsec_min
# 1    15.41
# 2    16.46
# 3    14.50
# 
# [[5]]
#   carb mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1    1 25.34286  2.4900    86.0    33.9    110  3.460    20.22    18.1     65  1.835
# 2    2 22.40000  2.8628   117.2    30.4    175  3.845    22.90    15.2     52  1.513
# 3    3 16.30000  3.8600   180.0    17.3    180  4.070    18.00    15.2    180  3.730
# 4    4 15.79000  3.8974   187.0    21.0    264  5.424    18.90    10.4    110  2.620
# 5    6 19.70000  2.7700   175.0    19.7    175  2.770    15.50    19.7    175  2.770
# 6    8 15.00000  3.5700   335.0    15.0    335  3.570    14.60    15.0    335  3.570
#   qsec_min
# 1    18.61
# 2    16.70
# 3    17.40
# 4    14.50
# 5    15.50
# 6    14.60

答案 1 :(得分:3)

'R中的'avg'不是函数。相反,它可以是mean。因此,将对象标识符名称从list.avg更改为list.mean,将list.对象保留在list中,然后遍历named listimap一起,用list.删除前缀str_remove,使用group_by_at按公共分组元素分组,然后summarise_at应用函数时循环的值我们get从前缀中删除了那些列上的名称

library(tidyverse)
list.mean <- list("mpg","wt","hp")
lst(list.mean, list.max, list.min) %>% 
   imap(~ {

   func <- str_remove(.y, '^list\\.')
    vars1 <- unlist(.x)



  mtcars %>%
     group_by_at(unlist(list.group)) %>%
      summarise_at(vars(vars1), ~ get(func)(.))


  })

答案 2 :(得分:2)

使用map遍历list.group,使用group_by_at分组list.group的每个元素,因为它们是字符串,然后在所需的列处汇总,最后将所有元素绑定在一起

library(purrr)
library(dplyr)
map(list.group, ~mtcars %>% 
          #.x will be "cyl", "vs" ... etc 
          group_by_at(.x) %>% 
          {bind_cols(summarise_at(.,unlist(list.avg), list(avg=mean)),
                     summarise_at(.,unlist(list.min), list(min=min)),
                     summarise_at(.,unlist(list.max), list(max=max))
                     )
          }
    )