Question

我需要加速我正在进行的模拟，并且我发现我的一个函数的特定组件是它的速度有多慢的主要原因。

这部分函数的工作是演示如何增加从分布中随机抽取的数量（ n ）会增加该抽样集的平均估计的精度。

程序如下：

示例 n 随机抽取具有固定参数 mu 和 sigma 的正态分布，其中 n 与1到500.（在这个例子中，我只设置mu = 500和sigma = 100。）
在每个 n 处，计算所有采样值的平均值
重复此过程1,000次。

我目前在嵌套循环中有这个，我知道它效率不高。这是代码：

# generate empty container for the simulated data
# parameters: 
# n_repetition = how many times to repeat the whole procedure
# max_n = maximum number of draws to explore

set.seed(42)
n_repetition <- 1000
max_n <- 500

# function to generate n random draws, and find their mean
r_norm <- function(n, mean, sd){
 temp <- rnorm(n, mean, sd)
 return(mean(temp))
}

sim_results <- matrix(0, nrow = n_repetition, ncol = max_n)

for(i in 1:n_repetition){
 for(j in 1:max_n){
   sim_results[i, j] <- r_norm(j, mean = 500, sd = 100)
 }
}

这很慢;在我的机器上大约9.80秒。因此，我尝试使用“应用家庭”方法。事实证明这一点很慢：

sim_results <- matrix(1:max_n, nrow = max_n, ncol = n_repetition)
sim_results <- apply(sim_results, 1:2, r_norm, mean = 500, sd = 100)

我不知道该怎么办。我认为R中的减速将是循环，但我使用“apply”删除它并且它也一样慢。

我甚至无法想到如何加快速度，所以非常感谢任何帮助。

Answer 1

Based on my comment above. The existing nested for loop is generating an new set of random numbers for each repetition. An improvement is to generate 1 set of random numbers per repetition and use the built-in cummean function.

The code below shows the comparison between the original code and the improvement. The original code took about 13 sec, the improvement ~1 sec.

print(Sys.time())
set.seed(42)
n_repetition <- 1000
max_n <- 500

sim_results <- matrix(0, nrow = n_repetition, ncol = max_n)

for(i in 1:n_repetition){
  for(j in 1:max_n){
    sim_results[i, j] <- mean(rnorm(j, mean = 500, sd = 100))
  }
}

print(Sys.time())
sim_results2 <- matrix(0, nrow = n_repetition, ncol = max_n)
set.seed(42)
for(i in 1:n_repetition){
    sim_results2[i, ] <- cummean(rnorm(max_n, mean = 500, sd = 100))

}
print(Sys.time())

提高循环速度（应用没有帮助）

1 个答案: