基本上,我想循环遍历,按“ list.group”中的列对数据进行分组,然后为“ list.avg”,“ list.max”和“ list.min”中的每一列创建摘要统计信息”,以使列为mpg_avg,wt_avg,hp_avg,mpg_max,hp_max ... mpg_min,hp_min等。
data("mtcars")
list.avg <- list("mpg","wt","hp")
list.max <- list("mpg","hp","wt","qsec")
list.min <- list("mpg","hp","wt","qsec")
list.group <- list("cyl","vs","am","gear","carb")
所以我应该为list.group中的每一列有一个单独的表。
答案 0 :(得分:3)
首先,将所有avg / max / min变量放在一个列表中会很有帮助。
to_summarise <-
list(mean = c("mpg","wt","hp"),
max = c("mpg","hp","wt","qsec"),
min = c("mpg","hp","wt","qsec"))
现在我们可以map
超过list.group
,并且在每个list.group
值内,imap
超过to_summarise
,然后merge
将所有结果加在一起
library(tidyverse)
map(list.group, ~{
grouped <-
mtcars %>%
group_by_at(.x)
out <-
imap(to_summarise, ~{
grouped %>%
summarise_at(.x, setNames(list(get(.y)), .y))
})
out %>%
reduce(merge, by = .x)
})
输出
# [[1]]
# cyl mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1 4 26.66364 2.285727 82.63636 33.9 113 3.190 22.90 21.4 52 1.513
# 2 6 19.74286 3.117143 122.28571 21.4 175 3.460 20.22 17.8 105 2.620
# 3 8 15.10000 3.999214 209.21429 19.2 335 5.424 18.00 10.4 150 3.170
# qsec_min
# 1 16.7
# 2 15.5
# 3 14.5
#
# [[2]]
# vs mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1 0 16.61667 3.688556 189.72222 26.0 335 5.424 18.0 10.4 91 2.140
# 2 1 24.55714 2.611286 91.35714 33.9 123 3.460 22.9 17.8 52 1.513
# qsec_min
# 1 14.5
# 2 16.9
#
# [[3]]
# am mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1 0 17.14737 3.768895 160.2632 24.4 245 5.424 22.9 10.4 62 2.465
# 2 1 24.39231 2.411000 126.8462 33.9 335 3.570 19.9 15.0 52 1.513
# qsec_min
# 1 15.41
# 2 14.50
#
# [[4]]
# gear mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1 3 16.10667 3.892600 176.1333 21.5 245 5.424 20.22 10.4 97 2.465
# 2 4 24.53333 2.616667 89.5000 33.9 123 3.440 22.90 17.8 52 1.615
# 3 5 21.38000 2.632600 195.6000 30.4 335 3.570 16.90 15.0 91 1.513
# qsec_min
# 1 15.41
# 2 16.46
# 3 14.50
#
# [[5]]
# carb mpg_mean wt_mean hp_mean mpg_max hp_max wt_max qsec_max mpg_min hp_min wt_min
# 1 1 25.34286 2.4900 86.0 33.9 110 3.460 20.22 18.1 65 1.835
# 2 2 22.40000 2.8628 117.2 30.4 175 3.845 22.90 15.2 52 1.513
# 3 3 16.30000 3.8600 180.0 17.3 180 4.070 18.00 15.2 180 3.730
# 4 4 15.79000 3.8974 187.0 21.0 264 5.424 18.90 10.4 110 2.620
# 5 6 19.70000 2.7700 175.0 19.7 175 2.770 15.50 19.7 175 2.770
# 6 8 15.00000 3.5700 335.0 15.0 335 3.570 14.60 15.0 335 3.570
# qsec_min
# 1 18.61
# 2 16.70
# 3 17.40
# 4 14.50
# 5 15.50
# 6 14.60
答案 1 :(得分:3)
'R
中的'avg'不是函数。相反,它可以是mean
。因此,将对象标识符名称从list.avg
更改为list.mean
,将list.
对象保留在list
中,然后遍历named
list
与imap
一起,用list.
删除前缀str_remove
,使用group_by_at
按公共分组元素分组,然后summarise_at
应用函数时循环的值我们get
从前缀中删除了那些列上的名称
library(tidyverse)
list.mean <- list("mpg","wt","hp")
lst(list.mean, list.max, list.min) %>%
imap(~ {
func <- str_remove(.y, '^list\\.')
vars1 <- unlist(.x)
mtcars %>%
group_by_at(unlist(list.group)) %>%
summarise_at(vars(vars1), ~ get(func)(.))
})
答案 2 :(得分:2)
使用map
遍历list.group
,使用group_by_at
分组list.group
的每个元素,因为它们是字符串,然后在所需的列处汇总,最后将所有元素绑定在一起
library(purrr)
library(dplyr)
map(list.group, ~mtcars %>%
#.x will be "cyl", "vs" ... etc
group_by_at(.x) %>%
{bind_cols(summarise_at(.,unlist(list.avg), list(avg=mean)),
summarise_at(.,unlist(list.min), list(min=min)),
summarise_at(.,unlist(list.max), list(max=max))
)
}
)