我正在尝试编写一个函数,该函数允许我通过对数据框中的多个因子进行分组来生成描述性统计信息。我花了太多时间试图让我的函数识别我选择的变量。
这是假数据:
grouping1 <- c("red", "blue", "blue", "green", "red", "blue", "red", "green")
grouping2 <- c("high", "high", "low", "medium", "low", "high", "medium", "high")
value <- c(22,40,72,41,36,16,88,99)
fake_df <- data.frame(grouping1, grouping2, value)
假代码示例:
library(dplyr)
by_group_fun <- function(fun.data.in, fun.grouping.factor){
fake_df2 <- fun.data.in %>%
group_by(fun.grouping.factor) %>%
summarize(mean = mean(value), median = median(value))
fake_df2
}
by_group_fun(fake_df, grouping1)
by_group_fun(fake_df, grouping2)
这给了我:
Error in grouped_df_impl(data, unname(vars), drop) : Column `fun.grouping.factor` is unknown
我尝试将函数中选择的by变量赋值给一个新变量并将其转发。
假代码示例(第二次尝试):
by_group_fun2 <- function(fun.data.in, fun.grouping.factor){
fun.data.in$by_var <- fun.data.in$fun.grouping.factor
fake_df2 <- fun.data.in %>%
group_by(by_var) %>%
summarize(mean = mean(value), median = median(value))
fake_df2
}
by_group_fun2(fake_df, grouping1)
by_group_fun2(fake_df, grouping2)
这是第二次尝试,给了我:
Error in grouped_df_impl(data, unname(vars), drop) : Column `by_var` is unknown
答案 0 :(得分:2)
在不使用dplyr编程的情况下获得相同输出的一种非常简单的方法是将分组列收集到长格式。按结果键和值列进行分组将获得您要求的所有组合,而不会超出单个data.frame:
library(tidyverse)
fake_df <- data_frame(grouping1 = c("red", "blue", "blue", "green", "red", "blue", "red", "green"),
grouping2 = c("high", "high", "low", "medium", "low", "high", "medium", "high"),
value = c(22,40,72,41,36,16,88,99))
fake_df %>%
gather(group_var, group_val, -value) %>%
group_by(group_var, group_val) %>%
summarise(mean = mean(value),
median = median(value))
#> # A tibble: 6 x 4
#> # Groups: group_var [?]
#> group_var group_val mean median
#> <chr> <chr> <dbl> <dbl>
#> 1 grouping1 blue 42.66667 40.0
#> 2 grouping1 green 70.00000 70.0
#> 3 grouping1 red 48.66667 36.0
#> 4 grouping2 high 44.25000 31.0
#> 5 grouping2 low 54.00000 54.0
#> 6 grouping2 medium 64.50000 64.5
答案 1 :(得分:1)
使用此示例指导您
myfun <- function(df, thesecols) {
require(dplyr)
thesecols <- enquo(thesecols) # need to quote
df %>%
group_by_at(vars(!!thesecols)) # !! unquotes
}
myfun(fake_df, grouping1)
输出
# A tibble: 8 x 3
# Groups: grouping1 [3]
grouping1 grouping2 value
<fctr> <fctr> <dbl>
1 red high 22
2 blue high 40
3 blue low 72
4 green medium 41
5 red low 36
6 blue high 16
7 red medium 88
8 green high 99