我正在阅读dplyr's vignette,试图弄清楚如何在功能代码中使用dplyr
。中途讨论了如何在enquos
上使用...
以便将多个参数传递给group_by。
它如何工作的简短示例
grp <- rlang::enquos(...)
df %>%
group_by(!!!grp)
我不知道是否有一种方法可以分配多个表达式值而不保留...
而不进行一些可疑的编码。
使用以下示例了解通话的外观:
#reproducable data
df <- datasets::USJudgeRatings
df$name <- rownames(df)
df <- tidyr::gather(df, key = "key", value = "value", -name)
df$dummy <- c("1","2")
test_summarize <- function(df, sum.col, grp = NULL, filter = NULL) {
filter <- rlang::enquo(filter)
sum.col <- rlang::enquo(sum.col)
if(!is.null(rlang::get_expr(filter))){
df <- dplyr::filter(df, !!filter)
}
#how grp is turned into a character vector to be passed to .dots in group_by
grp <- substitute(grp)
if(!is.null(grp)){
grp <- deparse(grp)
grp <- strsplit(gsub(pattern = "list\\(|c\\(|\\)|", replacement = "", x = grp), split =",")[[1]]
grp <- gsub(pattern = "^ | $", replacement = "", x = grp)
df %>%
dplyr::group_by(.dots=grp) %>%
dplyr::summarise(mean = mean(!!sum.col), sum = sum(!!sum.col), n = n())
} else{
df %>%
dplyr::summarise(mean = mean(!!sum.col), sum = sum(!!sum.col), n = n())
}
}
test_summarize(df, sum.col=value, grp = c(name, dummy))
# A tibble: 86 x 5
# Groups: name [?]
name dummy mean sum n
<chr> <fct> <dbl> <dbl> <int>
1 AARONSON,L.H. 1 7.17 43 6
2 AARONSON,L.H. 2 7.42 44.5 6
3 ALEXANDER,J.M. 1 8.35 50.1 6
4 ALEXANDER,J.M. 2 7.95 47.7 6
5 ARMENTANO,A.J. 1 7.53 45.2 6
6 ARMENTANO,A.J. 2 7.7 46.2 6
7 BERDON,R.I. 1 8.67 52 6
8 BERDON,R.I. 2 8.25 49.5 6
9 BRACKEN,J.J. 1 5.65 33.9 6
10 BRACKEN,J.J. 2 5.82 34.9 6
# ... with 76 more rows
这对我试图做的事情有效,但是我想知道是否有更好的方法来接受参数并处理它们。我将原始grp
调用变成类似于enquos(...)
失败的尝试,因此我进行了一次解析并将其转换为字符向量,说实话,我可能只希望用户传递字符?
我选择不使用字符向量作为预期输入,因为考虑到该函数的sum.col和filter参数期望使用NSE表达式,因此我试图保持一致。也许在rlang包中有一些东西可以将原始表达式的每个元素转换成一个单数列表?
编辑:修复了可重复的示例并提供了预期的输出结果
答案 0 :(得分:1)
如果我们使用group_by_at
,则可能不需要if/else
参数
test_summarize <- function(df, sum.col, grp = NULL, filter = NULL) {
df %>%
group_by_at(grp) %>%
summarise(mean = mean({{sum.col}}),
sum = sum({{sum.col}}), n = n())
}
test_summarize(df, sum.col=value, grp = c("name", "dummy"))
# A tibble: 86 x 5
# Groups: name [43]
# name dummy mean sum n
# <chr> <chr> <dbl> <dbl> <int>
# 1 AARONSON,L.H. 1 7.17 43 6
# 2 AARONSON,L.H. 2 7.42 44.5 6
# 3 ALEXANDER,J.M. 1 8.35 50.1 6
# 4 ALEXANDER,J.M. 2 7.95 47.7 6
# 5 ARMENTANO,A.J. 1 7.53 45.2 6
# 6 ARMENTANO,A.J. 2 7.7 46.2 6
# 7 BERDON,R.I. 1 8.67 52 6
# 8 BERDON,R.I. 2 8.25 49.5 6
# 9 BRACKEN,J.J. 1 5.65 33.9 6
#10 BRACKEN,J.J. 2 5.82 34.9 6
# … with 76 more rows
test_summarize(df, sum.col=value)
# A tibble: 1 x 3
# mean sum n
# <dbl> <dbl> <int>
#1 7.57 3908. 516
与
相同df %>%
summarise(mean = mean(value), sum = sum(value), n = n())
# mean sum n
#1 7.57345 3907.9 516
如果我们使用filter
,则一个选项是...
并通过尽可能多的过滤条件
test_summarize <- function(df, sum.col, grp = NULL, ...) {
df %>%
filter(!!! rlang::enexprs(...)) %>%
group_by_at(grp) %>%
summarise(mean = mean({{sum.col}}), sum = sum({{sum.col}}), n = n())
}
test_summarize(df, sum.col=value, grp = c("name", "dummy"),
key %in% c("CONT", "INTG"), value > 6.5)
# A tibble: 77 x 5
# Groups: name [43]
# name dummy mean sum n
# <chr> <chr> <dbl> <dbl> <int>
# 1 AARONSON,L.H. 2 7.9 7.9 1
# 2 ALEXANDER,J.M. 1 8.9 8.9 1
# 3 ALEXANDER,J.M. 2 6.8 6.8 1
# 4 ARMENTANO,A.J. 1 7.2 7.2 1
# 5 ARMENTANO,A.J. 2 8.1 8.1 1
# 6 BERDON,R.I. 1 8.8 8.8 1
# 7 BERDON,R.I. 2 6.8 6.8 1
# 8 BRACKEN,J.J. 1 7.3 7.3 1
# 9 BURNS,E.B. 1 8.8 8.8 1
#10 CALLAHAN,R.J. 1 10.6 10.6 1
# … with 67 more rows
,这也将在没有过滤器参数的情况下进行评估
test_summarize(df, sum.col=value, grp = c("name", "dummy"))
# A tibble: 86 x 5
# Groups: name [43]
# name dummy mean sum n
# <chr> <chr> <dbl> <dbl> <int>
# 1 AARONSON,L.H. 1 7.17 43 6
# 2 AARONSON,L.H. 2 7.42 44.5 6
# 3 ALEXANDER,J.M. 1 8.35 50.1 6
# 4 ALEXANDER,J.M. 2 7.95 47.7 6
# 5 ARMENTANO,A.J. 1 7.53 45.2 6
# 6 ARMENTANO,A.J. 2 7.7 46.2 6
# 7 BERDON,R.I. 1 8.67 52 6
# 8 BERDON,R.I. 2 8.25 49.5 6
# 9 BRACKEN,J.J. 1 5.65 33.9 6
#10 BRACKEN,J.J. 2 5.82 34.9 6
# … with 76 more rows
与您的第一个输出相同