Question

假设dt是data.table个对象，其中包含A，B和C列。

我希望在列上循环以过滤掉某些行，然后在该列上应用函数：

for(col in c("A", "B", "C")){
  dt[col %in% some_filter[[col], col := some_function(col), with=FALSE]
}

其中some_filter是包含一些有效值的list，例如some_filter[["A"]] = c("just", "an", "example")等。

但是，通过在这4个位置引用col，data.table似乎搞乱了命名空间并且失败了。

有一个解决方法是通过临时变量，但如何在一行中执行此任务？

无效的代码是：

library(data.table)
library(dplyr)
dt <- data.table(A=1:10, B=11:20, C=21:30)
f <- list()
f[["A"]] <- 3:5
f[["B"]] <- 14:18
f[["C"]] <- 28:29
for(col in colnames(dt)){
  dt[col %in% f[[col]], col := col * 2, with=F] # Double up some rows
}

Answer 1

我们可以使用get从包含其名称的字符变量中访问列。在()的LHS周围:=也优先使用with = F

for(col in colnames(dt)){
  dt[get(col) %in% f[[col]], (col) := get(col) * 2L] # Double up some rows
}

#     A  B  C
# 1:  1 11 21
# 2:  2 12 22
# 3:  6 13 23
# 4:  8 28 24
# 5: 10 30 25
# 6:  6 32 26
# 7:  7 34 27
# 8:  8 36 56
# 9:  9 19 58
# 10: 10 20 30

Answer 2

另一种选择是使用set

for(nm1 in names(dt)) {
   i1 <- which(dt[[nm1]] %in% f[[nm1]])
   set(dt, i= i1, j = nm1, value = dt[[nm1]][i1]*2L)
 }
dt
#     A  B  C
# 1:  1 11 21
# 2:  2 12 22
# 3:  6 13 23
# 4:  8 28 24
# 5: 10 30 25
# 6:  6 32 26
# 7:  7 34 27
# 8:  8 36 56
# 9:  9 19 58
#10: 10 20 30

R data.table一行语句处理混淆变量名

2 个答案: