在`dplyr :: _ join`函数中使用`rlang`准引用

时间:2019-11-13 21:47:21

标签: r dplyr rlang quasiquotes

我正在尝试编写一个使用rlang的准引号的自定义函数。此函数还在内部使用dplyr的{​​{1}}函数。我在下面提供了一个最小的工作示例来说明我的问题。

join

从这里可以看出,我想避免显示消息# needed libraries library(tidyverse) # function definition df_combiner <- function(data, x, group.by) { # check how many variables were entered for this grouping variable group.by <- as.list(rlang::quo_squash(rlang::enquo(group.by))) # based on number of arguments, select `group.by` in cases like `c(cyl)`, # the first list element after `quo_squash` will be `c` which we don't need, # but if we pass just `cyl`, there is no `c`, this will take care of that # issue group.by <- if (length(group.by) == 1) { group.by } else { group.by[-1] } # creating internal dataframe df <- dplyr::group_by(.data = data, !!!group.by, .drop = TRUE) # creating dataframes to be joined: one with tally, one with summary df_tally <- dplyr::tally(df) df_mean <- dplyr::summarise(df, mean = mean({{ x }}, na.rm = TRUE)) # without specifying `by` argument, this works but prints a message I want to avoid print(dplyr::left_join(x = df_tally, y = df_mean)) # joining by specifying `by` argument (my failed attempt) dplyr::left_join(x = df_tally, y = df_mean, by = !!!group.by) } # using the function df_combiner(diamonds, carat, c(cut, clarity)) #> Joining, by = c("cut", "clarity") #> # A tibble: 40 x 4 #> # Groups: cut [5] #> cut clarity n mean #> <ord> <ord> <int> <dbl> #> 1 Fair I1 210 1.36 #> 2 Fair SI2 466 1.20 #> 3 Fair SI1 408 0.965 #> 4 Fair VS2 261 0.885 #> 5 Fair VS1 170 0.880 #> 6 Fair VVS2 69 0.692 #> 7 Fair VVS1 17 0.665 #> 8 Fair IF 9 0.474 #> 9 Good I1 96 1.20 #> 10 Good SI2 1081 1.04 #> # ... with 30 more rows #> Error in !group.by: invalid argument type ,因此明确希望为#> Joining, by = c("cut", "clarity")函数输入by参数,但是我不确定如何执行此操作。 (我尝试过_joinrlang::as_string等)。

3 个答案:

答案 0 :(得分:3)

联接函数的by参数采用字符串向量。使用deparse从表达式到字符串:

dplyr::left_join(x = df_tally, y = df_mean, by = map_chr(group.by, deparse))

答案 1 :(得分:3)

我们可以使用as_string

转换为字符串
dplyr::left_join(x = df_tally, y = df_mean,
            by = map_chr(group.by, rlang::as_string))

df_combiner <- function(data, x, group.by) {
  # check how many variables were entered for this grouping variable
  group.by <- as.list(rlang::quo_squash(rlang::enquo(group.by)))

  # based on number of arguments, select `group.by` in cases like `c(cyl)`,
  # the first list element after `quo_squash` will be `c` which we don't need,
  # but if we pass just `cyl`, there is no `c`, this will take care of that
  # issue
  group.by <-
    if (length(group.by) == 1) {
      group.by
    } else {
      group.by[-1]
    }

  # creating internal dataframe
  df <- dplyr::group_by(.data = data, !!!group.by, .drop = TRUE)

  # creating dataframes to be joined: one with tally, one with summary
  df_tally <- dplyr::tally(df)
  df_mean <- dplyr::summarise(df, mean = mean({{ x }}, na.rm = TRUE))

  # without specifying `by` argument, this works but prints a message I want to avoid
  #print(dplyr::left_join(x = df_tally, y = df_mean))

  # joining by specifying `by` argument (my failed attempt)
   dplyr::left_join(x = df_tally, y = df_mean, by = map_chr(group.by, rlang::as_string))

}

-检查

df_combiner(diamonds, carat, c(cut, clarity))
# A tibble: 40 x 4
# Groups:   cut [5]
#   cut   clarity     n  mean
#   <ord> <ord>   <int> <dbl>
# 1 Fair  I1        210 1.36 
# 2 Fair  SI2       466 1.20 
# 3 Fair  SI1       408 0.965
# 4 Fair  VS2       261 0.885
# 5 Fair  VS1       170 0.880
# 6 Fair  VVS2       69 0.692
# 7 Fair  VVS1       17 0.665
# 8 Fair  IF          9 0.474
# 9 Good  I1         96 1.20 
#10 Good  SI2      1081 1.04 
# … with 30 more rows

答案 2 :(得分:0)

正如先前的作者所提到的,“ by”期望一个字符串向量。在RStudio Community thread Should tidyeval be abandoned?

上的斯坦伍德(Stanwood)展示了一种从等价单到字符串的简单方法。

... tidyr :: left_join仍需要一个字符串列表:by = c(“ Species”, “ Sepal.Length”)。如果我想以编程方式提供这些最好的 我发现的解决方案是= sapply(sepaldims,quo_text)。考虑这个 用于将quo_text抽象到quasure列表的插件。

pip install justwatch