使用Julia更新运算符进行不必要的分配

时间:2014-11-26 16:57:32

标签: memory-management garbage-collection profiling julia

考虑以下功能:

function mytest(x, b)
    y = zeros(x[:,:,1])
    for i in 1:length(b)
        y += b[i] * x[:,:,i]
    end
    return y
end

当我运行它时,我得到以下内容:

x = rand(30,30,100000)
b = rand(100000)
@time mytest(x,b)

elapsed time: 0.571765222 seconds (727837732 bytes allocated, 66.49% gc time)

为什么要分配这么多内存并花费这么多时间进行垃圾回收?代码应该是类型稳定的,我希望+=运算符不执行重新分配。但是,它似乎每次添加两个矩阵时都会重新分配。

我应该认为这是朱莉娅的一个错误吗?更重要的是,我如何以不重新分配的方式编写此代码?

编辑:修正了错字。

3 个答案:

答案 0 :(得分:4)

@ cd98请求我的三嵌套循环解决方案,它解决了分配问题,但我认为它会落后于等效的矢量化版本。这是:

function mytest(x, b)
    d1, d2, d3 = size(x)
    y = zeros(eltype(x), d1, d2)
    for i in 1:d3
        for j in 1:d2
            for k in 1:d1
                y[k,j] += b[i] * x[k,j,i]
            end
        end
    end
    return y
end

x = rand(30,30,100000)
b = rand(100000)
@time mytest(x,b)
@time mytest(x,b)

输出:

elapsed time: 0.218220119 seconds (767172 bytes allocated)
elapsed time: 0.197181799 seconds (7400 bytes allocated)

答案 1 :(得分:4)

不能解决(原始)分配问题,但是通过简单地将循环包装在@inbounds的后一个解决方案中,我获得了1.8倍的加速:

function mytest_inbounds(x, b)
    d1, d2, d3 = size(x)
    y = zeros(eltype(x), d1, d2)
    @inbounds begin
        for i in 1:d3
            for j in 1:d2
                for k in 1:d1
                    y[k,j] += b[i] * x[k,j,i]
                end
            end
        end
    end
    return y
end

x = rand(30, 30, 100000)
b = rand(100000)
@time mytest(x, b)
@time mytest(x, b)
@time mytest_inbounds(x, b)
@time mytest_inbounds(x, b)

输出:

elapsed time: 0.39144919 seconds (767212 bytes allocated)
elapsed time: 0.353495867 seconds (7400 bytes allocated)
elapsed time: 0.202614643 seconds (396972 bytes allocated)
elapsed time: 0.193425902 seconds (7400 bytes allocated)

此外,这里有很多关于相关问题的讨论:

https://groups.google.com/forum/#!msg/julia-users/aYS_AvKqPCI/DyTiq4lKIAoJ

答案 2 :(得分:3)

还可以使用Base.Cartesian:在using Base.Cartesian之后,你可以写

@nloops 3 i x begin
  (@nref 2 y i) += b[i_3] * (@nref 3 x i)
end

扩展到与Jim的回答基本相同的循环。