Question

这：

function draw1(n)
    return rand(Normal(0,1), Int(n))
end

比这更快：

function draw2(n)
    result = zeros(Float64, Int(n))
    for i=1:Int(n)
        result[i] =  rand(Normal(0,1))
    end
    return result 
end

只是好奇为什么会这样，并且如果可以加速显式循环方式（我尝试@inbounds和@simd并且没有获得加速）。它是zeros()的初始分配吗？我在大约0.25秒时分别计时，这并没有完全解释差异（加上不是第一种方式预先分配一个阵列？）。

示例：

@time x = draw1(1e08)
  1.169986 seconds (6 allocations: 762.940 MiB, 4.53% gc time)
@time y = draw2(1e08)
  1.824750 seconds (6 allocations: 762.940 MiB, 3.05% gc time)

Answer 1

尝试此实施：

function draw3(n)
    d = Normal(0,1)
    result = Vector{Float64}(Int(n))
    @inbounds for i=1:Int(n)
        result[i] =  rand(d)
    end
    return result 
end

有什么区别：

使用@inbounds
仅创建一次Normal(0,1)
执行result

当我测试它时，它与draw1具有基本相同的性能（虽然我没有在10e8矢量大小上测试它（内存不足） - 如果你可以运行这样的@benchmark它会很好）：

julia> using BenchmarkTools                        

julia> @benchmark draw1(10e5)                      
BenchmarkTools.Trial:                              
  memory estimate:  7.63 MiB                       
  allocs estimate:  2                              
  --------------                                   
  minimum time:     12.296 ms (0.00% GC)           
  median time:      13.012 ms (0.00% GC)           
  mean time:        14.510 ms (8.49% GC)           
  maximum time:     84.253 ms (81.30% GC)          
  --------------                                   
  samples:          345                            
  evals/sample:     1                              

julia> @benchmark draw2(10e5)                      
BenchmarkTools.Trial:                              
  memory estimate:  7.63 MiB                       
  allocs estimate:  2                              
  --------------                                   
  minimum time:     20.374 ms (0.00% GC)           
  median time:      21.622 ms (0.00% GC)           
  mean time:        22.787 ms (5.95% GC)           
  maximum time:     92.265 ms (77.18% GC)          
  --------------                                   
  samples:          220                            
  evals/sample:     1                              

julia> @benchmark draw3(10e5)                      
BenchmarkTools.Trial:                              
  memory estimate:  7.63 MiB                       
  allocs estimate:  2                              
  --------------                                   
  minimum time:     12.415 ms (0.00% GC)           
  median time:      12.956 ms (0.00% GC)           
  mean time:        14.456 ms (8.67% GC)           
  maximum time:     84.342 ms (83.74% GC)          
  --------------                                   
  samples:          346                            
  evals/sample:     1

编辑：实际上在一个单独的函数中定义一个循环（与rand完全一样）可以比draw4提供更好的draw3性能：

function g!(d, v)
    @inbounds for i=1:length(v)
        v[i] = rand(d)
    end
end

function draw4(n)
    result = Vector{Float64}(Int(n))
    g!(Normal(0,1), result)
    return result 
end

Answer 2

一个较短的答案是内置的实现是最快的，幸运的是经常这样。

而不是上面的draw4，你可以使用内置的

function draw5(n)
   result = Vector{Float64}(Int(n))
   rand!(Normal(0,1), result)
end

使用类似rand!的内容填充现有向量将始终为入站。

从配送中多次抽取的最快方式

2 个答案: