Question

下面是我正在处理的一段代码的简化版本（为避免混淆，省略了许多额外的计算）。它只是cumsum函数的修改形式。我不想重新发明轮子，这个功能是否已经存在？如果没有，哪种方案可以提供最佳速度？

#Set up the data   
set.seed(1)   
junk <- rnorm(1000000)   
junk1 <- rnorm(1000000)   
cumval <- numeric(1000000)   

#Initialize the accumulator   
cumval[1] <- 1   

#Perform the modified cumsum
system.time({   
for (i in 2:1000000) cumval[i] <- junk[i] + (junk1[i] * cumval[i-1])       
})   

#Plot the result
plot(cumval, type="l")

Answer 1

速度更快，但没有给出正确的结果。运行此

set.seed(1)

N <- 10

junk  <- rnorm(N)

junk1 <- rnorm(N)

cumval <- numeric(N)
cumval.1 <- numeric(N)
cumval[1] <- 1

for( i in 2:N ) cumval[i] <- junk[i] + junk1[i]*cumval[i-1]
cumval

cumval.1 <- cumsum( junk[-1] + (junk1[-1] * cumval.1[-N]) ) 

cumval.1

你会发现cumval和cumval.1的长度不一样。

需要重写递归关系。我没有看到将重复转换为非递归公式的方法。

Answer 2

考虑cumval [5]。使用j []表示junk和jk []表示junk1并省略*符号，其扩展名为：

j[5] +jk[5]j[4] + jk[5]jk[4]j[3] + jk[5]jk[4]jk[3]j[2] + jk[5]jk[4]jk[3]jk[2]

该模式表明这可能是（接近？）第五个词的表达式：

    sum(  j[1:5] * c(1, Reduce("*" , rev(jk[2:5]), accumulate=TRUE) )

Answer 3

此算法非常适合compiler包！

#Set up the data   
set.seed(1)   
junk <- rnorm(1000000)   
junk1 <- rnorm(1000000)

# The original code
f <- function(junk, junk1) {
  cumval <- numeric(1000000)
  cumval[1] <- 1
  for (i in 2:1000000) cumval[i] <- junk[i] + (junk1[i] * cumval[i-1])
  cumval
}
system.time( f(junk, junk1) ) # 4.11 secs

# Now try compiling it...
library(compiler)
g <- cmpfun(f)
system.time( g(junk, junk1) ) # 0.98 secs

...所以知道这个算法是否具有“典型”特征会很有趣 - 在这种情况下，编译器可能会更适合这样的情况......

修改了cumsum功能

3 个答案: