Question

我有一个数据框，想要以两种方式之一过滤它，通过“this”列或“that”列。我希望能够将列名称称为变量。如何（在dplyr中，如果有所不同）我是否通过变量引用列名？

library(dplyr)
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
df
#   this that
# 1    1    1
# 2    2    1
# 3    2    2
df %>% filter(this == 1)
#   this that
# 1    1    1

但是说我想使用变量column来保存“this”或“that”，并过滤column的任何值。 as.symbol和get都适用于其他环境，但不是这样：

column <- "this"
df %>% filter(as.symbol(column) == 1)
# [1] this that
# <0 rows> (or 0-length row.names)
df %>% filter(get(column) == 1)
# Error in get("this") : object 'this' not found

如何将column的值转换为列名？

Answer 1

我会避免一起使用get()。在这种情况下，这似乎很危险，特别是如果你正在编程的话。您可以使用未评估的电话或粘贴的字符串，但是您需要使用filter_()而不是filter()。

df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
column <- "this"

选项1 - 使用未评估的电话：

您可以将y硬编码为1，但在此我将其显示为y，以说明如何轻松更改表达式值。

expr <- lazyeval::interp(quote(x == y), x = as.name(column), y = 1)
## or 
## expr <- substitute(x == y, list(x = as.name(column), y = 1))
df %>% filter_(expr)
#   this that
# 1    1    1

选项2 - 使用paste()（显然更容易）：

df %>% filter_(paste(column, "==", 1))
#   this that
# 1    1    1

这两个选项的主要内容是我们需要使用filter_()而不是filter()。事实上，根据我的阅读，如果您使用dplyr进行编程，则应始终使用*_()函数。

我使用这篇文章作为有用的参考：character string as function argument r，我使用dplyr版本0.3.0.2。

Answer 2

从当前的dplyr帮助文件（由我强调）：

dplyr曾经提供每个动词的双重版本，后缀为下划线。这些版本具有标准评估（SE）语义：它们不是通过代码（如NSE动词）接受参数，而是通过值来获取参数。他们的目的是使用dplyr进行编程成为可能。但是， dplyr现在使用整洁的评估语义。 NSE动词仍然可以捕获它们的参数，但是现在可以取消引用这些参数的一部分。这提供了NSE动词的完全可编程性。因此，强调版本现在是多余的。

unquoting 的确切含义可以在插图Programming with dplyr中学习。它由UQ()函数实现，!!由syntactic sugar实现。现在有些情况 - 比如你的情况 - 只有前者正确起作用，因为!!可以与单!发生碰撞。

应用于您的示例：

library(dplyr)
df <- data.frame(this = c(1, 2, 2),
                 that = c(1, 1, 2))
column <- "this"

df %>% filter(UQ(as.name(column)) == 1)
#   this that
# 1    1    1

但不：

df %>% filter(!!as.name(column) == 1)
# [1] this that
# <0 Zeilen> (oder row.names mit Länge 0)

如果您添加一些额外的圆括号（感谢Martijn vd Voort建议），语法糖!!会再次起作用：

df %>% filter((!!as.name(column)) == 1)
#   this that
# 1    1    1

或者如果您只是交换两个比较操作数（感谢carand提示）：

df %>% filter(1 == !!as.name(column))
#   this that
# 1    1    1

Answer 3

关于理查德的解决方案，只想添加一下，如果你的列是字符。您可以添加shQuote按字符值进行过滤。

例如，您可以使用

df %>% filter_(paste(column, "==", shQuote("a")))

如果您有多个过滤器，则可以在collapse = "&"中指定paste。

df %>$ filter_(paste(c("column1","column2"), "==", shQuote(c("a","b")), collapse = "&"))

Answer 4

执行此操作的最新方法是使用my.data.frame %>% filter(.data[[myName]] == 1)，其中myName是包含列名的环境变量。

Answer 5

或使用filter_at

library(dplyr)
df %>% 
   filter_at(vars(column), any_vars(. == 1))

Answer 6

与上面解释的Salim B一样，但稍有改动：

df %>% filter(1 == !!as.name(column))

即。只是反转条件，因为!!否则表现喜欢

!!(as.name(column)==1)

Answer 7

这是最新dplyr版本的另一种解决方案：

df <- data.frame(this = c(1, 2, 2),
                 that = c(1, 1, 2))
column <- "this"

df %>% filter(.[[column]] == 1)

#  this that
#1    1    1

按字符列名称过滤数据（在dplyr中）

7 个答案: