我有:
DT = data.table(ID=rep(1:2,each = 2), Index=rep(1:2,times = 2), Close=3:6, Open=7:10)
我的算法早先确定DT
将时间信息保存在名为Index
的列中,因此算法存储以下映射:
time.col <- "Index"
现在算法想要执行一个等同于:
的计算DT[, list(Index, Value=cumsum(Close)),by=ID]
ID Index Value
1: 1 1 3
2: 1 2 7
3: 2 1 5
4: 2 2 11
如何重写该行并插入time.col
变量?
以下两项均无效:
DT[, list(time.col, Value=cumsum(Close)),by=ID]
DT[, list(substitute(time.col), Value=cumsum(Close)),by=ID]
答案 0 :(得分:2)
您可以为j
中的所有DT
创建表达式:
e <- parse(text = paste0("list(", time.col,",", "Value=cumsum(Close))"))
DT[, eval(e),by=ID]
修改强>
或者,如果你存储&#34;索引&#34;作为名称,您可以在time.col
:
.SD
time.col <- as.name("Index")
DT[,list(eval(time.col,envir=.SD), Value=cumsum(Close)),by=ID]
非常相似的问题:In R data.table, how do I pass variable parameters to an expression?
此外,这个问题有助于理解data.table
中非标准评估的奥秘:
eval and quote in data.table
答案 1 :(得分:2)
事实证明,上述evals
中最快的解决方案是
e <- parse(text = paste0("list(", time.col,",", "Value=cumsum(Close))"))
DT[, eval(e),by=ID]
然而,:=
解决方案更快。另见Arun关于复制的说明。
<强>数据集强>
dim(DT); object.size(DT); DT
[1] 1354402 8
81291568 bytes
Instrument Date Open High Low Close Volume Adjusted Close
1: GOOG/AMEX_ABI 1981-03-11 NA NA 6.56 6.75 217200 NA
2: GOOG/AMEX_ABI 1981-03-12 NA NA 6.66 6.88 616400 NA
3: GOOG/AMEX_ABI 1981-03-13 NA NA 6.81 6.84 462000 NA
4: GOOG/AMEX_ABI 1981-03-16 NA NA 6.81 7.00 306400 NA
5: GOOG/AMEX_ABI 1981-03-17 NA NA 6.88 6.88 925600 NA
---
1354398: YAHOO/TSX_AMM_TO 2014-04-24 1.56 1.58 1.56 1.58 2700 1.58
1354399: YAHOO/TSX_AMM_TO 2014-04-25 1.60 1.62 1.59 1.62 11000 1.62
1354400: YAHOO/TSX_AMM_TO 2014-04-28 1.59 1.61 1.54 1.54 7200 1.54
1354401: YAHOO/TSX_AMM_TO 2014-04-29 1.58 1.60 1.58 1.59 500 1.59
1354402: YAHOO/TSX_AMM_TO 2014-04-30 1.55 1.55 1.50 1.52 36800 1.52
<强>基准强>
time.col <- "Date"
fun <- function(){
out <- DT[, list(get(time.col), Value=cumsum(Close)),by=Instrument]
setnames(out, "V1", time.col)
}
fun2 <- function() {
DT[, Value := cumsum(Close), by=Instrument]
out <- DT[,c("Instrument",time.col, "Value"), with=FALSE]
DT[, Value:=NULL] # cleanup
out
}
fun2. <- function() {
DT[, Value := cumsum(Close), by=Instrument]
# out <- DT[,c("Instrument",time.col, "Value"), with=FALSE]
# DT[, Value:=NULL] # cleanup
# out
}
fun3 <- function() {
DT[,list( eval(as.name(time.col),envir=.SD), Value=cumsum(Close)),by=Instrument]
}
fun4 <- function() {
e <- parse(text = paste0("list(", time.col,",", "Value=cumsum(Close))"))
DT[, eval(e),by=Instrument]
}
<强>结果强>
library(rbenchmark)
benchmark(fun(),
fun2(),
fun3(),
fun4(),
replications=200)
test replications elapsed relative user.self sys.self user.child sys.child
1 fun() 200 5.40 1.327 5.29 0.11 NA NA
2 fun2() 200 5.18 1.273 4.72 0.45 NA NA
3 fun2.() 200 2.70 1.000 2.70 0.00 NA NA
3 fun3() 200 4.12 1.012 3.90 0.22 NA NA
4 fun4() 200 4.07 1.000 3.91 0.16 NA NA