Question

我正在尝试一个函数，该函数试图为我提供一列的最佳答案。在下面的示例中，我的整个功能只有一部分。我的最终目标是通过循环运行该函数。我检测到一些奇怪的东西：为什么在外部而不是在函数中定义“ df_col_indicator”时，print（df_col_indicator）会改变结果？使用print（df_col_indicator），我的功能实际上可以完全按照我的意愿进行操作。

library(dplyr)
library(tidyverse)

remove(list = ls())


dataframe_test <- data.frame(
  county_name = c("a", "b","c", "d","e", "f", "g", "h"),
  column_test1 = c(100,100,100,100,100,100,50,50),
  column_test2 = c(40,90,50,40,40,100,13,14),
  column_test3 = c(100,90,50,40,30,40,100,50),
  month = c("2020-09-01", "2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-08-01","2020-08-01"))


choose_top_5 <- function(df, df_col_indicator, df_col_month, char_month, numb_top, df_col_county) {
  
  ### this here changes output of my function
  #print(df_col_indicator) # changes output of my function depending on included or excluded
  
  ### enquo / ensym / deparse
  df_col_indicator_ensym <- ensym(df_col_indicator)
  
  df_col_month_ensym <- ensym(df_col_month)
  
  
  ### filter month and top 5 observations
  df_top <- df %>%
    filter(!!df_col_month_ensym == char_month) %>%
    slice_max(!!df_col_indicator_ensym, n = numb_top) %>%
    select(!!df_col_county, !!df_col_month_ensym, !!df_col_indicator_ensym)
  
  
  
  return(df_top)
  
  
}




### define "df_col_indicator" within the function
a = choose_top_5(df = dataframe_test, df_col_indicator = "column_test3",
                 df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
                 df_col_county = "county_name")

a


### define "df_col_indicator" externally
external = "column_test3"

b = choose_top_5(df = dataframe_test, df_col_indicator = external,
                 df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
                 df_col_county = "county_name")
b



### goal is to run function over loop
external <- c("column_test1","column_test2","column_test3")

my_list <- list()

for (i in external) {
  
  my_list[[i]] <- choose_top_5(df = dataframe_test, df_col_indicator = i,
                               df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
                               df_col_county = "county_name")
}

my_list

Answer 1

您的示例相当冗长。让我们用两个非常相似的函数将其简化为一个最小的可重现示例。它们都使用一个参数，只需将传递的变量打印到控制台，然后返回在同一变量上调用ensym的结果。

两者之间的唯一区别是调用print和ensym的顺序。

library(rlang)

test_ensym1 <- function(x)
{
  result <- ensym(x)
  print(x)
  return(result)
}

test_ensym2 <- function(x)
{
  print(x)
  result <- ensym(x)
  return(result)
}

现在，我们可能希望这两个函数执行完全相同的操作，实际上，当我们直接将字符串传递给它们时，它们都给出相同的结果：

test_ensym1("hello")
#> [1] "hello"
#> hello

test_ensym2("hello")
#> [1] "hello"
#> hello

但是看看当我们使用外部变量传入字符串时会发生什么：

y <- "hello"

test_ensym1(y)
#> [1] "hello"
#> y

test_ensym2(y)
#> [1] "hello"
#> hello

两个函数都仍然按预期打印“ hello”，但是它们返回不同的结果。首先我们调用ensym时，该函数返回符号y，而当我们首先调用print时，该函数返回符号hello。

这样做的原因是，当您在R中调用函数时，作为参数传递的符号不会立即求值。而是将它们解释为 promise对象，并根据需要在函数体内对其进行评估。正是这种懒惰评估允许进行一些整洁的欺骗。

上面两个函数之间的区别是调用print(x)会强制评估x。在此之前，x是未评估的符号。之后，它的行为就像您将在控制台中交互使用的任何其他变量一样，因此，当您调用ensym时，您是在此求值变量上调用它，而不是作为未求值的Promise。

另一方面，

ensym不会不评估x，因此，如果首先调用ensym，它将返回未评估的符号传递给函数。

实际上，解决问题的最简单方法是将print调用后的ensym移到。

Answer 2

您还必须将ensym更改为as.symbol。

考虑这样的功能

f <- function(x) ensym(x)
myvar <- "some string"

您会发现的

> f("some string")
`some string`

> f(myvar)
myvar

这是因为ensym仅搜索领先的事物。它将尝试将找到的任何内容转换为符号并仅返回该符号（请注意，如果找到的内容既不是字符串也不是变量，则将出现错误）。这样，在您的第一个示例中，ensym返回column_test3；在您的第二个中，它返回external。

据我所知，您要做的是获取df_col_indicator表示的值，然后将该值转换为符号。这意味着您必须首先评估df_col_indicator，然后进行转换。 as.symbol会满足您的需求。

g <- function(x) as.symbol(x)
myvar <- "some string"

一些测试

> g("some string")
`some string`

> g(myvar)
`some string`

为什么print（）会更改函数的输出？

2 个答案: