在group_by()%>%mutate()函数调用中使用带引号的变量

时间:2019-03-29 21:59:25

标签: r dplyr tidyeval

可复制的示例

cats <-
  data.frame(
    name = c(letters[1:10]),
    weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
    type = c(rep("not_fat", 5), rep("fat", 5))
  )

get_means <- function(df, metric, group) {
  df %>%
    group_by(.[[group]]) %>%
    mutate(mean_stat = mean(.[[metric]])) %>%
    pull(mean_stat) %>%
    unique()
}

get_means(cats, metric = "weight", group = "type")

我尝试过的事情

我希望返回两个值,而不是一个值。看来groupby失败了。

我尝试了所有方法,包括使用quo(),eval()和replace(),UQ(),!!,以及许多其他方法来尝试使group_by()中的内容正常工作。

这似乎非常简单,但我无法弄清楚。

推理代码

将变量括在引号中的决定是因为我在ggplot aes_string()调用中使用了它们。我在函数中排除了ggplot代码以简化代码,否则这很容易,因为我们可以使用标准评估。

5 个答案:

答案 0 :(得分:4)

我认为在tidyeval框架中执行此操作的“预期”方法是将参数作为名称(而不是字符串)输入,然后使用enquo()引用参数。 ggplot2了解整洁的评估运算符,因此它也适用于ggplot2

首先,让我们修改示例中的dplyr摘要函数:

library(tidyverse)
library(rlang)

get_means <- function(df, metric, group) {

  metric = enquo(metric)
  group = enquo(group)

  df %>%
    group_by(!!group) %>%
    summarise(!!paste0("mean_", as_label(metric)) := mean(!!metric))
}

get_means(cats, weight, type)
  type    mean_weight
1 fat            20.0
2 not_fat        10.2
get_means(iris, Petal.Width, Species)
  Species    mean_Petal.Width
1 setosa                0.246
2 versicolor            1.33 
3 virginica             2.03

现在添加ggplot:

get_means <- function(df, metric, group) {

  metric = enquo(metric)
  group = enquo(group)

  df %>%
    group_by(!!group) %>%
    summarise(mean_stat = mean(!!metric)) %>% 
    ggplot(aes(!!group, mean_stat)) + 
      geom_point()
}

get_means(cats, weight, type)

enter image description here

我不确定您打算使用哪种绘图,但是可以使用整洁的评估来绘制数据和汇总值。例如:

plot_func = function(data, metric, group) {

  metric = enquo(metric)
  group = enquo(group)

  data %>% 
    ggplot(aes(!!group, !!metric)) + 
      geom_point() +
      geom_point(data=. %>% 
                   group_by(!!group) %>%
                   summarise(!!metric := mean(!!metric)),
                 shape="_", colour="red", size=8) + 
      expand_limits(y=0) +
      scale_y_continuous(expand=expand_scale(mult=c(0,0.02)))
}

plot_func(cats, weight, type)

enter image description here

仅供参考,您可以允许函数使用...参数和enquos代替enquo来使用任意数量的分组变量(包括无分组变量)(这也需要使用{ {1}}(取消引号),而不是!!!(取消引号)。

!!
get_means <- function(df, metric, ...) {

  metric = enquo(metric)
  groups = enquos(...)

  df %>%
    group_by(!!!groups) %>%
    summarise(!!paste0("mean_", quo_text(metric)) := mean(!!metric))
}
get_means(mtcars, mpg, cyl, vs)
    cyl    vs mean_mpg
1     4     0     26  
2     4     1     26.7
3     6     0     20.6
4     6     1     19.1
5     8     0     15.1
get_means(mtcars, mpg)

答案 1 :(得分:3)

magrittr代词.代表整个数据,因此您已取所有观察值的均值。而是使用整洁的eval代词.data来代表当前组的数据帧片:

get_means <- function(df, metric, group) {
  df %>%
    group_by(.data[[group]]) %>%
    mutate(mean_stat = mean(.data[[metric]])) %>%
    pull(mean_stat) %>%
    unique()
}

答案 2 :(得分:2)

如果要使用字符串作为名称,例如在您的示例中,正确的方法是使用sym将字符串转换为符号,并使用!!取消引用:

get_means <- function(df, metric, group) {
    df %>%
      group_by(!!sym(group)) %>%
      mutate(mean_stat = mean(!!sym(metric))) %>%
      pull(mean_stat) %>%
      unique()
}

get_means(cats, metric = "weight", group = "type")
[1] 10.06063 17.45906

如果要在函数中使用裸名,则将enquo!!一起使用:

get_means <- function(df, metric, group) {
    group <- enquo(group)
    metric <- enquo(metric)
    df %>%
      group_by(!!group) %>%
      mutate(mean_stat = mean(!!metric)) %>%
      pull(mean_stat) %>%
      unique()
}

get_means(cats, metric = weight, group = type)
[1] 10.06063 17.45906

您的示例中发生了什么?

有趣的是,.[[group]]适用于分组,但不适用于您的思维方式。这会将数据框的所述列作为向量的子集,然后使其成为一个新变量,并根据其分组:

cats %>%
    group_by(.[['type']])

# A tibble: 10 x 4
# Groups:   .[["type"]] [2]
   name  weight type    `.[["type"]]`
   <fct>  <dbl> <fct>   <fct>        
 1 a       9.60 not_fat not_fat      
 2 b       8.71 not_fat not_fat      
 3 c      12.0  not_fat not_fat      
 4 d       8.48 not_fat not_fat      
 5 e      11.5  not_fat not_fat      
 6 f      17.0  fat     fat          
 7 g      20.3  fat     fat          
 8 h      17.3  fat     fat          
 9 i      15.3  fat     fat          
10 j      17.4  fat     fat  

您的问题来自mutate语句。 mutate(mean_stat = mean(.[['weight']]))无需选择weight,只需提取cats %>% group_by(.[['type']]) %>% mutate(mean_stat = mean(.[['weight']])) # A tibble: 10 x 5 # Groups: .[["type"]] [2] name weight type `.[["type"]]` mean_stat <fct> <dbl> <fct> <fct> <dbl> 1 a 9.60 not_fat not_fat 13.8 2 b 8.71 not_fat not_fat 13.8 3 c 12.0 not_fat not_fat 13.8 4 d 8.48 not_fat not_fat 13.8 5 e 11.5 not_fat not_fat 13.8 6 f 17.0 fat fat 13.8 7 g 20.3 fat fat 13.8 8 h 17.3 fat fat 13.8 9 i 15.3 fat fat 13.8 10 j 17.4 fat fat 13.8 列作为向量,计算平均值,然后将该单个值分配给新列

const

答案 3 :(得分:1)

我会稍作修改(如果我正确理解了您想达到的目标):

 get_means <- function(df, metric, group) {
      df %>%
        group_by(!!sym(group)) %>%
        summarise(mean_stat = mean(!!sym(metric)))%>% pull(mean_stat)
    }
    get_means(cats, "weight", "type")

[1] 20.671772  9.305811

给出与:完全相同的输出:

cats %>% group_by(type) %>% summarise(mean_stat=mean(weight)) %>%
  pull(mean_stat)

[1] 20.671772  9.305811

答案 4 :(得分:0)

使用*_at函数:

library(dplyr)
get_means <- function(df, metric, group) {
  df %>%
    group_by_at(group) %>%
    mutate_at(metric,list(mean_stat = mean)) %>%
    pull(mean_stat) %>%
    unique()
}

get_means(cats, metric = "weight", group = "type")
# [1] 10.12927 20.40541

数据

set.seed(1)
cats <-
  data.frame(
    name = c(letters[1:10]),
    weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
    type = c(rep("not_fat", 5), rep("fat", 5))
  )