Question

在R中，我有以下示例模块，它重复一次for循环n次：

function(n){
#inputs - n - number of results required
    #reserve n spaces for results
    r_num_successes <- 1:n

    #start looping n times
    for(i in 1:n){

        #set first uniform "random" deviate equal to 0.05 and number of successes to 0
        current_unif <- 0.05
        num_successes <- 0

        #start while loop that updates current_unif - it runs as long as 
        #current_unif is less than 0.95, increments num_successes each loop
        while(current_unif < 0.95){

            #set current_unif to a uniform random deviate between the
            #existing current_unif and 1
            current_unif <- runif(1,current_unif)
            num_successes <- num_successes + 1
        }

        #set the i-th element of the results vector to that final num_successes
        #generated by the while loop
        r_num_successes[i] <- num_successes
    }

            #output the mean of all the successes
    return(mean(r_num_successes))
}

当n变大时，这开始变得很慢。有没有一种优化它的好方法？

Answer 1

没有什么可以用纯R来显着提高速度。字节编译会给你一个小的改进，但你需要转移到编译代码以获得任何显着的速度增益。

更新：这是一个Rcpp解决方案，仅适用于Dirk：）

> nCode <- '
+   int N = as<int>(n);
+   std::vector<double> rns;
+ 
+   RNGScope scope;  // Initialize Random number generator
+ 
+   for(int i=0; i<N; i++) {
+     double current_unif = 0.05;
+     double num_successes = 0;
+     while(current_unif < 0.95) {
+       current_unif = ::Rf_runif(current_unif, 1.0);
+       num_successes++;
+     }
+     rns.push_back(num_successes);
+   }
+ 
+   double mean = std::accumulate(rns.begin(), rns.end(), 0.0) / rns.size();
+   return wrap(mean);  // Return to R
+ '
>
> library(inline)
> nFunRcpp <- cxxfunction(signature(n="int"), nCode, plugin="Rcpp")
> library(compiler)
> nFunCmp <- cmpfun(nFun)
> system.time(nFun(1e5))
   user  system elapsed 
  3.100   0.000   3.098 
> system.time(nFunCmp(1e5))
   user  system elapsed 
  2.120   0.000   2.114 
> system.time(nFunRcpp(1e5))
   user  system elapsed 
  0.010   0.000   0.016

Answer 2

为了完整起见，这是我向@JoshuaUlrich建议的内容：

R> res <- benchmark(nFun(1e5L), nFunCmp(1e5L), nFunRcpp(1e5L), nFun2Rcpp(1e5L),
+                  columns = c("test", "replications", "elapsed", "relative"),
+                  replications=10,
+                  order="relative")
R> print(res)
               test replications elapsed  relative
4 nFun2Rcpp(100000)           10   0.117   1.00000
3  nFunRcpp(100000)           10   0.122   1.04274
2   nFunCmp(100000)           10  13.845 118.33333
1      nFun(100000)           10  23.212 198.39316
R>

nFun2Rcpp只添加一行：

rns.reserve(N);

并将作业更改为

rns[i] = num_successes;

而不是使用.push_back()，这使得内存分配更有效率。

编辑结果表明这是不准确的，并反映了随机算法。如果我为每个添加集合set.seed()，则两个C ++版本之间的时间相同。这里没有可衡量的收益。

加速while循环嵌套在R中的for循环中

2 个答案: