Question

我试图在必要的地方熟悉在我的代码中使用NSE。假设我有成对的列，并希望为每对生成一个新的字符串变量，以指示该对中的值是否相同。

library(tidyverse)
library(magrittr)

df <- tibble(one.x = c(1,2,3,4),
             one.y = c(2,2,4,3),
             two.x = c(5,6,7,8),
             two.y = c(6,7,7,9),
             # not used but also in df
             extra = c(5,5,5,5))

我正在尝试编写与以下代码可完成相同操作的代码：

df.mod <- df %>%
  # is one.x the same as one.y?
  mutate(one.x_suffix = case_when( 
    one.x == one.y ~ "same",
    TRUE ~ "different")) %>%
  # is two.x the same as two.y?
  mutate(two.x_suffix = case_when(
    two.x == two.y ~ "same",
    TRUE ~ "different"))

df.mod
#> # A tibble: 4 x 6
#>   one.x one.y two.x two.y one.x_suffix two.x_suffix
#>   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
#> 1    1.    2.    5.    6. different    different   
#> 2    2.    2.    6.    7. same         different   
#> 3    3.    4.    7.    7. different    same        
#> 4    4.    3.    8.    9. different    different

在我的实际数据中，我具有任意数量的这样的对（例如three.x和three.y，...），所以我想使用mutate_at编写一个更通用的过程。 / p>

我的策略是在{{1}内进行相等性测试的一侧，将“ .x”变量作为.vars传递，然后将{x1的{x1}}传递给“ y” }，就像这样：

gsub

这是我遇到异常的时候。看来case_when部分工作正常：

df.mod <- df %>%
  mutate_at(vars(one.x, two.x),
            funs(suffix = case_when(
              . == !!sym(gsub("x", "y", deparse(substitute(.)))) ~ "same",
              TRUE ~ "different")))
#> Error in mutate_impl(.data, dots): Evaluation error: object 'value' not found.

在这里引起异常的是gsub操作。我做错了什么？

^{由reprex package（v0.2.1）于2018-11-07创建}

Answer 1

这里是map的一个选项。我们将split数据集分为成对的“ x”，“ y”列和列名称的子字符串，然后循环访问list的数据集，其中map，transmute到通过比较每个数据集的行来创建新的“后缀”列，将数据集list绑定到单个数据集，并绑定到原始数据集（bind_cols）

library(tidyverse)
df %>% 
    select(matches("\\.x|\\.y")) %>%
    split.default(str_remove(names(.), "\\..*")) %>%
    map( ~ .x %>%
                 transmute(!! paste0(names(.)[1], "_suffix") := 
                      reduce(., ~ c("different", "same")[(.x == .y) + 1]))) %>%
    bind_cols %>%
    bind_cols(df, .)
# A tibble: 4 x 7
#  one.x one.y two.x two.y extra one.x_suffix two.x_suffix
#   <dbl> <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
#1     1     2     5     6     5 different    different   
#2     2     2     6     7     5 same         different   
#3     3     4     7     7     5 different    same        
#4     4     3     8     9     5 different    different

或者另一个选择是创建一个表达式，然后解析它

library(rlang)
expr1 <- paste(grep("\\.x", names(df), value = TRUE), 
      grep("\\.y", names(df), value = TRUE), sep="==", collapse=";")
df %>% 
    mutate(!!!rlang::parse_exprs(expr1)) %>%
    rename_at(vars(matches("==")), ~ paste0(str_remove(.x, "\\s.*"), "_suffix"))
# A tibble: 4 x 7
#  one.x one.y two.x two.y extra one.x_suffix two.x_suffix
#  <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>        <lgl>       
#1     1     2     5     6     5 FALSE        FALSE       
#2     2     2     6     7     5 TRUE         FALSE       
#3     3     4     7     7     5 FALSE        TRUE        
#4     4     3     8     9     5 FALSE        FALSE

注意：可以像第一个解决方案一样将其转换为“相同/不同”。但是，最好将其保留为逻辑列

Answer 2

问题不在!!sym中，如下面的示例所示：

df %>% mutate_at( vars(one.x, two.x),
                  funs(suffix = case_when(
                    . == !!sym("one.y") ~ "same",
                    TRUE ~ "different")))
# # A tibble: 4 x 6
#   one.x one.y two.x two.y one.x_suffix two.x_suffix
#   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
# 1     1     2     5     6 different    different   
# 2     2     2     6     7 same         different   
# 3     3     4     7     7 different    different   
# 4     4     3     8     9 different    different

问题在于试图在substitute(.)中取消对case_when的引用：

df %>% mutate_at( vars(one.x, two.x),
                  funs(suffix = case_when(
                    . == !!substitute(.) ~ "same",
                    TRUE ~ "different")))
# Error in mutate_impl(.data, dots) : 
#   Evaluation error: object 'value' not found.

其原因是运算符优先级。在!!的帮助页面上：

!!运算符取消引用其参数。它会在周围环境中立即得到评估。

在上面的示例中，!!substitute(.)的上下文是公式，该公式本身位于case_when中。这导致表达式立即被value内定义的case_when所取代，而在您的数据框中却没有任何意义。

您要使表达式紧靠其环境，这就是quosures的作用。通过将substitute替换为rlang::enquo，您可以捕获产生.的表达式及其定义环境（您的数据框）。为了使内容整洁，让我们将gsub的操作移到一个单独的函数中：

x2y <- function(.x)
{
  ## Capture the expression and its environment
  qq <- enquo(.x)

  ## Retrieve the expression and deparse it
  txt <- rlang::get_expr(qq) %>% rlang::expr_deparse()

  ## Replace x with y, as before
  txty <- gsub("x", "y", txt)

  ## Put the new expression back into the quosure
  rlang::set_expr( qq, sym(txty) )
}

您现在可以直接在代码中使用新的x2y函数。有了quasure，就不需要取消引用，因为这些表达式已经带有它们的环境了。您可以使用rlang::eval_tidy对其进行评估：

df %>% mutate_at(vars(one.x, two.x),
                 funs(suffix = case_when(
                   . == rlang::eval_tidy(x2y(.)) ~ "same",
                   TRUE ~ "different" )))
# # A tibble: 4 x 6
#   one.x one.y two.x two.y one.x_suffix two.x_suffix
#   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
# 1     1     2     5     6 different    different   
# 2     2     2     6     7 same         different   
# 3     3     4     7     7 different    same        
# 4     4     3     8     9 different    different

编辑以解决您评论中的问题：将您的所有代码合并到一行几乎总是一个Bad Idea™，我强烈建议不要这样做。但是，由于这个问题是关于NSE的，所以我认为了解为什么简单地获取x2y的内容并将其粘贴到case_when内会导致问题很重要。

enquo()与substitute()一样，在函数的调用环境中进行查找，并将参数替换为提供给该函数的表达式。 substitute()仅在一个环境中上移（当您取消引用时在value内找到case_when），而enquo()一直在上升，只要调用堆栈中的函数正确处理{ {3}}。（并且大多数dplyr / tidyverse函数都这样做。）因此，当您在enquo(.x)中调用x2y时，它将上移提供给调用堆栈上每个函数的表达式，最终找到one.x。 / p>

当您在enquo()中调用mutate_at时，它现在与one.x处于同一级别，因此它也将参数（在这种情况下为one.x）替换为定义它的表达式（在这种情况下为向量c(1,2,3,4)）。这不是您想要的。现在，您不想上升级别，而是要保持与one.x相同的级别。为此，请使用rlang::quo()代替rlang::enquo()：

library( rlang )   ## To maintain at least a little bit of sanity

df %>% 
 mutate_at(vars(one.x, two.x),
   funs(suffix = case_when(
    . == eval_tidy(set_expr(quo(.), 
                            sym(gsub("x","y", expr_deparse(get_expr(quo(.)))))
                       )
            ) ~ "same",
    TRUE ~ "different" )))
# Now works as expected

使用NSE（在dplyr中）时出错：找不到对象“值”

2 个答案: