Question

这是一个大项目中的小挑战，所以我将尽力保持这一点。

我正在尝试有条件地将列添加到data.table，然后在有条件的基础上处理它们。

x <- T
y <- data.table(a = 1:10, b = c(rep(1,5), rep(2,5)))

y[  # filter some rows
  a != 1
][  # conditionally add two calculated columns
  ,
  if(x){
    `:=` (
      c = a*b,
      d = 1/b
    )
  }
][  # process columns and group
  ,
  list(
    a = sum(a),
    b = sum(b),
    if(x) c = sum(c)  # only add c if it's created above
  ),
  by = if(x) list(b, d) else list(b)  # only group by d if it's created above
]

这是输出（错误引用第二组[]）：

Error in eval(expr, envir, enclos) : object 'd' not found
In addition: Warning message:
In deconstruct_and_eval(m, envir, enclos) :
  Caught and removed `{` wrapped around := in j. := and `:=`(...) are 
                defined for use in j, once only and in particular ways. See help(":=").

当然，错误是警告的症状。我怎么能这样做？

正如@Michal指出的那样，将if()语句放在data.table调用之外是一个选项：

if(x) {
  y[
   ...
  ]
} else {
  y[
   ...
  ]
}

我希望有一种方法可以在不重复整个代码的情况下完成这项工作，以简化一切。

Answer 1

我无法想到在j-expression内部进行此操作的方法，因为:=在那里得到了评估（它真的只有在它的根部才有效）表达式树），但您可以将其作为解决方法放在i-expression中：

x = FALSE
y[a != 1][x, `:=`(c = a * b, d = 1/b)][]
#    a b
#1:  2 1
#2:  3 1
#3:  4 1
#4:  5 1
#5:  6 2
#6:  7 2
#7:  8 2
#8:  9 2
#9: 10 2

x = TRUE
y[a != 1][x, `:=`(c = a * b, d = 1/b)][]
#    a b  c   d
#1:  2 1  2 1.0
#2:  3 1  3 1.0
#3:  4 1  4 1.0
#4:  5 1  5 1.0
#5:  6 2 12 0.5
#6:  7 2 14 0.5
#7:  8 2 16 0.5
#8:  9 2 18 0.5
#9: 10 2 20 0.5

由于c(1)与c(1, NULL)相同，因此当您不确定将有多少元素组成它们时，它可用于返回完整的向量。

有条件地在j

中添加列

y[
  ,
  c(
    list(
      a = sum(a), 
      b = sum(b)
    ), 
    if(x) list(c = sum(c))
  )
]

并有条件地在by

中添加列

y[
  ,
  ...,
  by = c("b", if(x) "d")
]

by无法接受vector list次，但会接受vector个列名称。

有条件地使用data.table中的变量

1 个答案: