Question

考虑以下简单的Julia代码在四个复杂矩阵上运行：

n = 400

z = eye(Complex{Float64},n)
id = eye(Complex{Float64},n)
fc = map(x -> rand(Complex{Float64}), id)
cr = map(x -> rand(Complex{Float64}), id)

s = 0.1 + 0.1im

@time for j = 1:n
    for i = 1:n
        z[i,j] = id[i,j] - fc[i,j]^s * cr[i,j]
    end
end

尽管所有变量都已预先分配，但时间显示了几百万个内存分配：

0.072718 seconds (1.12 M allocations: 34.204 MB, 7.22% gc time)

如何避免所有这些分配（和GC）？

Answer 1

高性能Julia代码的首要技巧之一是避免使用全局变量。仅此一项就可以将分配数量减少7次。如果必须使用全局变量，那么提高性能的一种方法是使用const。使用const可以防止更改类型，但可以通过警告更改值。

在不使用函数的情况下考虑此修改后的代码：

const n = 400

z = Array{Complex{Float64}}(n,n)
const id = eye(Complex{Float64},n)
const fc = map(x -> rand(Complex{Float64}), id)
const cr = map(x -> rand(Complex{Float64}), id)

const s = 0.1 + 0.1im

@time for j = 1:n
    for i = 1:n
            z[i,j] = id[i,j] - fc[i,j]^s * cr[i,j]
    end
end

时间显示了这个结果：

0.028882 seconds (160.00 k allocations: 4.883 MB)

分配数量不仅会降低7 times，而且执行速度也会提高2.2 times。

现在让我们将第二个提示应用于高性能Julia代码;在函数中写出所有东西。将上述代码写入函数z_mat(n)：

function z_mat(n)
    z  = Array{Complex{Float64}}(n,n)
    id = eye(Complex{Float64},n)
    fc = map(x -> rand(Complex{Float64}), id)
    cr = map(x -> rand(Complex{Float64}), id)

    s = 1.0 + 1.0im

    @time for j = 1:n
        for i = 1:n
            z[i,j] = id[i,j] - fc[i,j]^s * cr[i,j]
        end
    end    
end

并正在运行

z_mat(40)
  0.000273 seconds
@time z_mat(400)
  0.027273 seconds
  0.032443 seconds (429 allocations: 9.779 MB)

这比整个函数的原始代码少2610 times分配，因为循环单独进行零分配。

如何在Julia中避免内存分配？

1 个答案: