我需要加速我正在进行的模拟,并且我发现我的一个函数的特定组件是它的速度有多慢的主要原因。
这部分函数的工作是演示如何增加从分布中随机抽取的数量( n )会增加该抽样集的平均估计的精度。
程序如下:
我目前在嵌套循环中有这个,我知道它效率不高。这是代码:
# generate empty container for the simulated data
# parameters:
# n_repetition = how many times to repeat the whole procedure
# max_n = maximum number of draws to explore
set.seed(42)
n_repetition <- 1000
max_n <- 500
# function to generate n random draws, and find their mean
r_norm <- function(n, mean, sd){
temp <- rnorm(n, mean, sd)
return(mean(temp))
}
sim_results <- matrix(0, nrow = n_repetition, ncol = max_n)
for(i in 1:n_repetition){
for(j in 1:max_n){
sim_results[i, j] <- r_norm(j, mean = 500, sd = 100)
}
}
这很慢;在我的机器上大约9.80秒。因此,我尝试使用“应用家庭”方法。事实证明这一点很慢:
sim_results <- matrix(1:max_n, nrow = max_n, ncol = n_repetition)
sim_results <- apply(sim_results, 1:2, r_norm, mean = 500, sd = 100)
我不知道该怎么办。我认为R中的减速将是循环,但我使用“apply”删除它并且它也一样慢。
我甚至无法想到如何加快速度,所以非常感谢任何帮助。
答案 0 :(得分:3)
Based on my comment above. The existing nested for loop is generating an new set of random numbers for each repetition. An improvement is to generate 1 set of random numbers per repetition and use the built-in cummean
function.
The code below shows the comparison between the original code and the improvement. The original code took about 13 sec, the improvement ~1 sec.
print(Sys.time())
set.seed(42)
n_repetition <- 1000
max_n <- 500
sim_results <- matrix(0, nrow = n_repetition, ncol = max_n)
for(i in 1:n_repetition){
for(j in 1:max_n){
sim_results[i, j] <- mean(rnorm(j, mean = 500, sd = 100))
}
}
print(Sys.time())
sim_results2 <- matrix(0, nrow = n_repetition, ncol = max_n)
set.seed(42)
for(i in 1:n_repetition){
sim_results2[i, ] <- cummean(rnorm(max_n, mean = 500, sd = 100))
}
print(Sys.time())