我还是R编程的新手,我需要优化我的部分代码。我将在下面解释它是如何工作的。
myfunc <- function(dt){
indexes = which(dt$time == CURRENT)
for(i in indexes){
# columns foo, bar & baz are used to build rowname
# and colnames
linename = paste(dt$foo[i], "_", dt$bar[i], sep="")
colname = dt$baz[i]
# related_var is the name of an other global var
# and value is the corresponding value in
# related_var[linename, colname]
dt$value[i] = get(dt$related_var[i])[[linename, colname]]
}
return(dt)
}
这不是我的代码部分所以我只是将其简化了
CURRENT = 0
MAX = 1000
for(i in 1:MAX){
doSomeStuffOnGlobalVars()
# get datas from global var for this CURRENT
dt = myfunc(dt)
CURRENT = CURRENT + 1
}
为CURRENT (like 1,2,3,4,5,... 1000)
的所有值调用此函数,我们希望在$value
中为dt
的每一行更新dt$time == CURRENT
,而事情就是变量“ varname“每CURRENT
dt : a data.table ordered by time in the form of
foo bar baz time related_var value
1 1 "toto" 1 "varname" NA
1 2 "toto" 1 "varname" NA
2 1 "tata" 1 "varname" NA
2 8 "toto" 1 "varname" NA
...
related_var : contain the name of a global data.frame which have its
colnames defined by baz in dt
rownames defined by a combination of foo & bar (foo_bar) in dt
example of "varname" variable:
toto tata
1_1 1.6 2
1_2 42 1337
... ... ...
10_10 3.14 1.61
我已经做了一些更改(我在data.frame
或data.table
之前使用了eval(parse(...))
但是这仍然很慢(dt约为5s,约有5000行),我是如果你有想法(R或纯算法)
N.B。告诉我它是否过于神秘
编辑:我发现慢速部分是dt$value[i] = get(dt$related_var[i])[[linename, colname]]
,如果我进行像justAvar = get(dt$related_var[i])[[linename, colname]]
这样的简单分配,速度就会快得多,所以现在我的问题是:“R如何通过索引?如果我想去index = 15,R是否会通过所有14个前面的元素?“
答案 0 :(得分:0)
首先,我会预先计算linename,我怀疑它几乎是为整个数据表计算的。将使用 data.table 参考魔法。第二,内联和简化功能。最后,使用 data.table [i,j,by] approach
dt <- ...
dt[, linename := paste(foo, "_", bar, sep="")]
CURRENT <- 0
MAX <- 1000
for(i in 1:MAX) {
doSomeStuffOnGlobalVars()
# get datas from global var for this CURRENT
dt[time == CURRENT, value := get(related_var)[[linename, baz]]]
CURRENT <- CURRENT + 1
}
更新
有用的读物:http://www.r-bloggers.com/strategies-to-speedup-r-code/
更新II
也可以在循环之前为dt设置关键
setkey(dt, time)