我需要按组对几十个变量进行操作,根据变量的不同,通常按照变量名来执行不同的指令,并进行一些临时更改并在此处和此处重命名。 / p>
使用修改后的Diamonds数据集进行说明的reprex如下:
library(tidyverse)
diamond_renamed <- diamonds %>%
rename(size_x = x, size_y = y, size_z = z) %>%
rename(val_1 = depth, val_2 = table)
diamond_summary <- bind_cols(diamond_renamed %>%
group_by(cut, color, clarity) %>%
summarise(
cost = sum(price)
),
diamond_renamed %>%
group_by(cut, color, clarity) %>%
summarise_at(
vars(contains("size")),
funs(median(.))
),
diamond_renamed %>%
group_by(cut, color, clarity) %>%
summarise_at(
vars(contains("val")),
funs(mean(.))
)
)
diamond_summary
#> # A tibble: 276 x 15
#> # Groups: cut, color [?]
#> cut color clarity cost cut1 color1 clarity1 size_x size_y size_z
#> <ord> <ord> <ord> <int> <ord> <ord> <ord> <dbl> <dbl> <dbl>
#> 1 Fair D I1 29532 Fair D I1 7.32 7.20 4.70
#> 2 Fair D SI2 243888 Fair D SI2 6.13 6.06 3.99
#> 3 Fair D SI1 247854 Fair D SI1 6.08 6.04 3.93
#> 4 Fair D VS2 112822 Fair D VS2 6.04 6 3.65
#> 5 Fair D VS1 14606 Fair D VS1 5.56 5.58 3.66
#> 6 Fair D VVS2 32463 Fair D VVS2 4.95 4.84 3.31
#> 7 Fair D VVS1 13419 Fair D VVS1 4.92 5.03 3.28
#> 8 Fair D IF 4859 Fair D IF 4.68 4.73 2.88
#> 9 Fair E I1 18857 Fair E I1 6.18 6.14 4.03
#> 10 Fair E SI2 325446 Fair E SI2 6.28 6.20 3.95
#> # ... with 266 more rows, and 5 more variables: cut2 <ord>, color2 <ord>,
#> # clarity2 <ord>, val_1 <dbl>, val_2 <dbl>
这会产生所需的结果:具有分组摘要的数据集...但是它也会重复分组变量。每次都必须重复group_by代码本身也不是一件好事……但我不确定其他方法。它可能也不是summarise
的最有效使用。我们如何避免重复,使代码更好?
谢谢!
答案 0 :(得分:2)
一个选择是在初始步骤中使用mutate
而不是summarize
,然后将这些列添加到group_by
diamond_renamed %>%
group_by(cut, color, clarity) %>%
group_by(cost = sum(price), add = TRUE) %>%
mutate_at(vars(contains("size")), median) %>%
group_by_at(vars(contains("size")), .add = TRUE) %>%
summarise_at(vars(contains("val")), mean)
# A tibble: 276 x 9
# Groups: cut, color, clarity, cost, size_x, size_y [?]
# cut color clarity cost size_x size_y size_z val_1 val_2
# <ord> <ord> <ord> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Fair D I1 29532 7.32 7.20 4.70 65.6 56.8
# 2 Fair D SI2 243888 6.13 6.06 3.99 64.7 58.6
# 3 Fair D SI1 247854 6.08 6.04 3.93 64.6 58.8
# 4 Fair D VS2 112822 6.04 6 3.65 62.7 60.3
# 5 Fair D VS1 14606 5.56 5.58 3.66 63.2 57.8
# 6 Fair D VVS2 32463 4.95 4.84 3.31 61.7 58.8
# 7 Fair D VVS1 13419 4.92 5.03 3.28 61.7 64.3
# 8 Fair D IF 4859 4.68 4.73 2.88 60.8 58
# 9 Fair E I1 18857 6.18 6.14 4.03 65.6 58.1
#10 Fair E SI2 325446 6.28 6.20 3.95 63.4 59.5
# ... with 266 more rows
注意:此处不重复OP中的分组“ cut”,“ color”,“ clarity”列。因此,它只有9列而不是15列