提高循环速度(应用没有帮助)

时间:2018-02-26 16:41:53

标签: r

我需要加速我正在进行的模拟,并且我发现我的一个函数的特定组件是它的速度有多慢的主要原因。

这部分函数的工作是演示如何增加从分布中随机抽取的数量( n )会增加该抽样集的平均估计的精度。

程序如下:

  • 示例 n 随机抽取具有固定参数 mu sigma 的正态分布,其中 n 与1到500.(在这个例子中,我只设置mu = 500和sigma = 100。)
  • 在每个 n 处,计算所有采样值的平均值
  • 重复此过程1,000次。

我目前在嵌套循环中有这个,我知道它效率不高。这是代码:

# generate empty container for the simulated data
# parameters: 
# n_repetition = how many times to repeat the whole procedure
# max_n = maximum number of draws to explore

set.seed(42)
n_repetition <- 1000
max_n <- 500

# function to generate n random draws, and find their mean
r_norm <- function(n, mean, sd){
 temp <- rnorm(n, mean, sd)
 return(mean(temp))
}

sim_results <- matrix(0, nrow = n_repetition, ncol = max_n)

for(i in 1:n_repetition){
 for(j in 1:max_n){
   sim_results[i, j] <- r_norm(j, mean = 500, sd = 100)
 }
}

这很慢;在我的机器上大约9.80秒。因此,我尝试使用“应用家庭”方法。事实证明这一点很慢:

sim_results <- matrix(1:max_n, nrow = max_n, ncol = n_repetition)
sim_results <- apply(sim_results, 1:2, r_norm, mean = 500, sd = 100)

我不知道该怎么办。我认为R中的减速将是循环,但我使用“apply”删除它并且它也一样慢。

我甚至无法想到如何加快速度,所以非常感谢任何帮助。

1 个答案:

答案 0 :(得分:3)

Based on my comment above. The existing nested for loop is generating an new set of random numbers for each repetition. An improvement is to generate 1 set of random numbers per repetition and use the built-in cummean function.

The code below shows the comparison between the original code and the improvement. The original code took about 13 sec, the improvement ~1 sec.

print(Sys.time())
set.seed(42)
n_repetition <- 1000
max_n <- 500

sim_results <- matrix(0, nrow = n_repetition, ncol = max_n)

for(i in 1:n_repetition){
  for(j in 1:max_n){
    sim_results[i, j] <- mean(rnorm(j, mean = 500, sd = 100))
  }
}

print(Sys.time())
sim_results2 <- matrix(0, nrow = n_repetition, ncol = max_n)
set.seed(42)
for(i in 1:n_repetition){
    sim_results2[i, ] <- cummean(rnorm(max_n, mean = 500, sd = 100))

}
print(Sys.time())