Question

我有类似的例子（简单的例子）：

using BenchmarkTools
function assign()
    e = zeros(100, 90000)
    e2 = ones(100) * 0.16
    e[:, 100:end] .= e2[:]
end
@benchmark assign()

，并且需要数千个时间步骤。这给出了

BenchmarkTools.Trial: 
  memory estimate:  68.67 MiB
  allocs estimate:  6
  --------------
  minimum time:     16.080 ms (0.00% GC)
  median time:      27.811 ms (0.00% GC)
  mean time:        31.822 ms (12.31% GC)
  maximum time:     43.439 ms (27.66% GC)
  --------------
  samples:          158
  evals/sample:     1

有更快的方法吗？

Answer 1

首先，我会假设你的意思

function assign1()
    e = zeros(100, 90000)
    e2 = ones(100) * 0.16
    e[:, 100:end] .= e2[:]
    return e  # <- important!
end

否则，您将不会返回e（！）的前99列：

julia> size(assign())
(100, 89901)

第二，不要这样做：

e[:, 100:end] .= e2[:]

e2[:]复制并复制e2，但是为什么呢？只需直接分配e2：

e[:, 100:end] .= e2

好的，但是让我们尝试一些不同的版本。请注意，无需将e2作为向量，只需分配一个标量即可：

function assign2()
    e = zeros(100, 90000)
    e[:, 100:end] .= 0.16  # Just broadcast a scalar!
    return e
end

function assign3()
    e = fill(0.16, 100, 90000)  # use fill instead of writing all those zeros that you will throw away
    e[:, 1:99] .= 0
    return e
end

function assign4()
    # only write exactly the values you need!
    e = Matrix{Float64}(undef, 100, 90000)
    e[:, 1:99] .= 0
    e[:, 100:end] .= 0.16
    return e
end

基准测试时间

julia> @btime assign1();
  14.550 ms (5 allocations: 68.67 MiB)

julia> @btime assign2();
  14.481 ms (2 allocations: 68.66 MiB)

julia> @btime assign3();
  9.636 ms (2 allocations: 68.66 MiB)

julia> @btime assign4();
  10.062 ms (2 allocations: 68.66 MiB)

第1版和第2版的速度相当快，但是您会注意到有2个分配而不是5个分配，但是，当然，较大的分配占主导地位。

第3版和第4版更快，但并不是那么快，但是您看到它避免了一些重复的工作，例如将值两次写入矩阵。版本3是最快的，不是很多，但是如果分配更加平衡，则这种情况会改变，在这种情况下，版本4会更快：

function assign3_()
    e = fill(0.16, 100, 90000)
    e[:, 1:44999] .= 0
    return e
end

function assign4_()
    e = Matrix{Float64}(undef, 100, 90000)
    e[:, 1:44999] .= 0
    e[:, 45000:end] .= 0.16
    return e
end

julia> @btime assign3_();
  11.576 ms (2 allocations: 68.66 MiB)

julia> @btime assign4_();
  8.658 ms (2 allocations: 68.66 MiB)

教训是避免做不必要的工作。

广播分配慢朱莉娅

1 个答案: