替换为data.table [,j,by]中的`j`元素

时间:2014-05-05 00:40:34

标签: r data.table

我有:

DT = data.table(ID=rep(1:2,each = 2), Index=rep(1:2,times = 2), Close=3:6, Open=7:10)

我的算法早先确定DT将时间信息保存在名为Index的列中,因此算法存储以下映射:

time.col <- "Index"

现在算法想要执行一个等同于:

的计算
DT[, list(Index, Value=cumsum(Close)),by=ID]
   ID Index Value
1:  1     1     3
2:  1     2     7
3:  2     1     5
4:  2     2    11

如何重写该行并插入time.col变量?

以下两项均无效:

DT[, list(time.col, Value=cumsum(Close)),by=ID]
DT[, list(substitute(time.col), Value=cumsum(Close)),by=ID]

2 个答案:

答案 0 :(得分:2)

您可以为j中的所有DT创建表达式:

e <- parse(text = paste0("list(", time.col,",", "Value=cumsum(Close))"))

DT[, eval(e),by=ID]

修改

或者,如果你存储&#34;索引&#34;作为名称,您可以在time.col

的环境中评估.SD
time.col <- as.name("Index")

DT[,list(eval(time.col,envir=.SD), Value=cumsum(Close)),by=ID]

非常相似的问题:In R data.table, how do I pass variable parameters to an expression?

此外,这个问题有助于理解data.table中非标准评估的奥秘: eval and quote in data.table

答案 1 :(得分:2)

事实证明,上述evals中最快的解决方案是

e <- parse(text = paste0("list(", time.col,",", "Value=cumsum(Close))")) DT[, eval(e),by=ID]

然而,:=解决方案更快。另见Arun关于复制的说明。

<强>数据集

dim(DT); object.size(DT); DT

[1] 1354402       8
81291568 bytes
               Instrument       Date Open High  Low Close Volume Adjusted Close
      1:    GOOG/AMEX_ABI 1981-03-11   NA   NA 6.56  6.75 217200             NA
      2:    GOOG/AMEX_ABI 1981-03-12   NA   NA 6.66  6.88 616400             NA
      3:    GOOG/AMEX_ABI 1981-03-13   NA   NA 6.81  6.84 462000             NA
      4:    GOOG/AMEX_ABI 1981-03-16   NA   NA 6.81  7.00 306400             NA
      5:    GOOG/AMEX_ABI 1981-03-17   NA   NA 6.88  6.88 925600             NA
     ---                                                                       
1354398: YAHOO/TSX_AMM_TO 2014-04-24 1.56 1.58 1.56  1.58   2700           1.58
1354399: YAHOO/TSX_AMM_TO 2014-04-25 1.60 1.62 1.59  1.62  11000           1.62
1354400: YAHOO/TSX_AMM_TO 2014-04-28 1.59 1.61 1.54  1.54   7200           1.54
1354401: YAHOO/TSX_AMM_TO 2014-04-29 1.58 1.60 1.58  1.59    500           1.59
1354402: YAHOO/TSX_AMM_TO 2014-04-30 1.55 1.55 1.50  1.52  36800           1.52

<强>基准

time.col <- "Date"
fun <- function(){
  out <- DT[, list(get(time.col), Value=cumsum(Close)),by=Instrument]
  setnames(out, "V1", time.col)
}

fun2 <- function() {
  DT[, Value := cumsum(Close), by=Instrument]
  out <- DT[,c("Instrument",time.col, "Value"), with=FALSE]
  DT[, Value:=NULL] # cleanup
  out
}

fun2. <- function() {
  DT[, Value := cumsum(Close), by=Instrument]
#   out <- DT[,c("Instrument",time.col, "Value"), with=FALSE]
#   DT[, Value:=NULL] # cleanup
#   out
}

fun3 <- function() {
  DT[,list( eval(as.name(time.col),envir=.SD), Value=cumsum(Close)),by=Instrument]
}

fun4 <- function() {
  e <- parse(text = paste0("list(", time.col,",", "Value=cumsum(Close))"))
  DT[, eval(e),by=Instrument]
}

<强>结果

library(rbenchmark)
benchmark(fun(),
          fun2(),
          fun3(),
          fun4(),
          replications=200)

     test replications elapsed relative user.self sys.self user.child sys.child
1   fun()          200    5.40    1.327      5.29     0.11         NA        NA
2  fun2()          200    5.18    1.273      4.72     0.45         NA        NA
3 fun2.()          200    2.70    1.000      2.70     0.00         NA        NA
3  fun3()          200    4.12    1.012      3.90     0.22         NA        NA
4  fun4()          200    4.07    1.000      3.91     0.16         NA        NA