加速R

时间:2015-07-06 09:50:22

标签: r runtime

基本上我正在帮助别人为他们的研究编写一些代码,但我通常的节省时间的策略并没有减少算法的运行时间,因为它足够合理。我希望其他人可能知道一个更好的方法来根据我编写的一个例子来快速运行一个函数,以避免包含有关该研究的信息。

示例中的对象小于她使用的对象(但可以很容易地变大)。对于实际的算法,这个小部分需要大约3分钟,但在整个情况下可能需要8-10分钟,并且需要运行1000-10000次。这就是我需要认真缩短运行时间的原因。

我目前如何做到这一点(希望有足够的评论让我的思维过程显而易见):

example<-array(rnorm(100000), dim=c(5, 25, 40, 20))

observation <- array(rnorm(600), dim=c(5, 5, 12))

calc.err<-function(value, observation){
  #'This creates the squared error for each observation, and each point in the
  #'example array, across the five values in the first dimension of each

  sqError<-(value-observation)^2

  #'the apply function here sums up the squared error for each observation and
  #'point.  This is the value returned

  return(apply(sqError, c(2,3), function(x) sum(x)))
}

run<-apply(example, c(2,3,4), function(x) calc.err(x, observation))

#'It isn't returned in the right format (small problem) but reformatting is fast
format<-array(run, dim=c(5, 12, 25, 40, 20))

如有必要,将澄清。

编辑: data.table包似乎非常有用。我将不得不学习这个方案,但预赛似乎要快得多。我想我正在使用数组,因为她给我的代码更快,因为这样的对象格式化了。没想到要改变它

1 个答案:

答案 0 :(得分:0)

这里有几个简单的重构以及时间:

calc.err2 <- function(value, observation){
  #'This creates the squared error for each observation, and each point in the
  #'example array, across the five values in the first dimension of each

  sqError<-(value-observation)^2

  #' getting rid of the anonymous function

  apply(sqError, c(2,3), sum)
}

calc.err3 <- function(value, observation){
  #'This creates the squared error for each observation, and each point in the
  #'example array, across the five values in the first dimension of each

  sqError<-(value-observation)^2

  #' replacing with colSums

  colSums(sqError)
}


R>microbenchmark(times=8, apply(example, 2:4, calc.err, observation),
+   apply(example, 2:4, calc.err2, observation),
+   apply(example, 2:4, calc.err3, observation)
+ )
Unit: milliseconds
                                        expr         min          lq
  apply(example, 2:4, calc.err, observation) 2284.350162 2321.875878
 apply(example, 2:4, calc.err2, observation) 2194.316755 2257.007572
 apply(example, 2:4, calc.err3, observation)  645.004808  652.567611
         mean       median           uq         max neval
 2349.7524509 2336.6661645 2393.3452420 2409.894876     8
 2301.7896566 2298.9346090 2362.5479790 2383.020177     8
  681.3176878  667.9070175  720.7049605  723.177516     8

colSums比相应的apply快。