可复制的示例
cats <-
data.frame(
name = c(letters[1:10]),
weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
type = c(rep("not_fat", 5), rep("fat", 5))
)
get_means <- function(df, metric, group) {
df %>%
group_by(.[[group]]) %>%
mutate(mean_stat = mean(.[[metric]])) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
我尝试过的事情
我希望返回两个值,而不是一个值。看来groupby失败了。
我尝试了所有方法,包括使用quo(),eval()和replace(),UQ(),!!,以及许多其他方法来尝试使group_by()中的内容正常工作。
这似乎非常简单,但我无法弄清楚。
推理代码
将变量括在引号中的决定是因为我在ggplot aes_string()调用中使用了它们。我在函数中排除了ggplot代码以简化代码,否则这很容易,因为我们可以使用标准评估。
答案 0 :(得分:4)
我认为在tidyeval框架中执行此操作的“预期”方法是将参数作为名称(而不是字符串)输入,然后使用enquo()
引用参数。 ggplot2
了解整洁的评估运算符,因此它也适用于ggplot2
。
首先,让我们修改示例中的dplyr
摘要函数:
library(tidyverse)
library(rlang)
get_means <- function(df, metric, group) {
metric = enquo(metric)
group = enquo(group)
df %>%
group_by(!!group) %>%
summarise(!!paste0("mean_", as_label(metric)) := mean(!!metric))
}
get_means(cats, weight, type)
type mean_weight 1 fat 20.0 2 not_fat 10.2
get_means(iris, Petal.Width, Species)
Species mean_Petal.Width 1 setosa 0.246 2 versicolor 1.33 3 virginica 2.03
现在添加ggplot:
get_means <- function(df, metric, group) {
metric = enquo(metric)
group = enquo(group)
df %>%
group_by(!!group) %>%
summarise(mean_stat = mean(!!metric)) %>%
ggplot(aes(!!group, mean_stat)) +
geom_point()
}
get_means(cats, weight, type)
我不确定您打算使用哪种绘图,但是可以使用整洁的评估来绘制数据和汇总值。例如:
plot_func = function(data, metric, group) {
metric = enquo(metric)
group = enquo(group)
data %>%
ggplot(aes(!!group, !!metric)) +
geom_point() +
geom_point(data=. %>%
group_by(!!group) %>%
summarise(!!metric := mean(!!metric)),
shape="_", colour="red", size=8) +
expand_limits(y=0) +
scale_y_continuous(expand=expand_scale(mult=c(0,0.02)))
}
plot_func(cats, weight, type)
仅供参考,您可以允许函数使用...
参数和enquos
代替enquo
来使用任意数量的分组变量(包括无分组变量)(这也需要使用{ {1}}(取消引号),而不是!!!
(取消引号)。
!!
get_means <- function(df, metric, ...) {
metric = enquo(metric)
groups = enquos(...)
df %>%
group_by(!!!groups) %>%
summarise(!!paste0("mean_", quo_text(metric)) := mean(!!metric))
}
get_means(mtcars, mpg, cyl, vs)
cyl vs mean_mpg
1 4 0 26
2 4 1 26.7
3 6 0 20.6
4 6 1 19.1
5 8 0 15.1
get_means(mtcars, mpg)
答案 1 :(得分:3)
magrittr代词.
代表整个数据,因此您已取所有观察值的均值。而是使用整洁的eval代词.data
来代表当前组的数据帧片:
get_means <- function(df, metric, group) {
df %>%
group_by(.data[[group]]) %>%
mutate(mean_stat = mean(.data[[metric]])) %>%
pull(mean_stat) %>%
unique()
}
答案 2 :(得分:2)
如果要使用字符串作为名称,例如在您的示例中,正确的方法是使用sym
将字符串转换为符号,并使用!!
取消引用:
get_means <- function(df, metric, group) {
df %>%
group_by(!!sym(group)) %>%
mutate(mean_stat = mean(!!sym(metric))) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
[1] 10.06063 17.45906
如果要在函数中使用裸名,则将enquo
与!!
一起使用:
get_means <- function(df, metric, group) {
group <- enquo(group)
metric <- enquo(metric)
df %>%
group_by(!!group) %>%
mutate(mean_stat = mean(!!metric)) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = weight, group = type)
[1] 10.06063 17.45906
您的示例中发生了什么?
有趣的是,.[[group]]
适用于分组,但不适用于您的思维方式。这会将数据框的所述列作为向量的子集,然后使其成为一个新变量,并根据其分组:
cats %>%
group_by(.[['type']])
# A tibble: 10 x 4
# Groups: .[["type"]] [2]
name weight type `.[["type"]]`
<fct> <dbl> <fct> <fct>
1 a 9.60 not_fat not_fat
2 b 8.71 not_fat not_fat
3 c 12.0 not_fat not_fat
4 d 8.48 not_fat not_fat
5 e 11.5 not_fat not_fat
6 f 17.0 fat fat
7 g 20.3 fat fat
8 h 17.3 fat fat
9 i 15.3 fat fat
10 j 17.4 fat fat
您的问题来自mutate
语句。 mutate(mean_stat = mean(.[['weight']]))
无需选择weight
,只需提取cats %>%
group_by(.[['type']]) %>%
mutate(mean_stat = mean(.[['weight']]))
# A tibble: 10 x 5
# Groups: .[["type"]] [2]
name weight type `.[["type"]]` mean_stat
<fct> <dbl> <fct> <fct> <dbl>
1 a 9.60 not_fat not_fat 13.8
2 b 8.71 not_fat not_fat 13.8
3 c 12.0 not_fat not_fat 13.8
4 d 8.48 not_fat not_fat 13.8
5 e 11.5 not_fat not_fat 13.8
6 f 17.0 fat fat 13.8
7 g 20.3 fat fat 13.8
8 h 17.3 fat fat 13.8
9 i 15.3 fat fat 13.8
10 j 17.4 fat fat 13.8
列作为向量,计算平均值,然后将该单个值分配给新列
const
答案 3 :(得分:1)
我会稍作修改(如果我正确理解了您想达到的目标):
get_means <- function(df, metric, group) {
df %>%
group_by(!!sym(group)) %>%
summarise(mean_stat = mean(!!sym(metric)))%>% pull(mean_stat)
}
get_means(cats, "weight", "type")
[1] 20.671772 9.305811
给出与:完全相同的输出:
cats %>% group_by(type) %>% summarise(mean_stat=mean(weight)) %>%
pull(mean_stat)
[1] 20.671772 9.305811
答案 4 :(得分:0)
使用*_at
函数:
library(dplyr)
get_means <- function(df, metric, group) {
df %>%
group_by_at(group) %>%
mutate_at(metric,list(mean_stat = mean)) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
# [1] 10.12927 20.40541
数据
set.seed(1)
cats <-
data.frame(
name = c(letters[1:10]),
weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
type = c(rep("not_fat", 5), rep("fat", 5))
)