Question

考虑以下功能：

function mytest(x, b)
    y = zeros(x[:,:,1])
    for i in 1:length(b)
        y += b[i] * x[:,:,i]
    end
    return y
end

当我运行它时，我得到以下内容：

x = rand(30,30,100000)
b = rand(100000)
@time mytest(x,b)

elapsed time: 0.571765222 seconds (727837732 bytes allocated, 66.49% gc time)

为什么要分配这么多内存并花费这么多时间进行垃圾回收？代码应该是类型稳定的，我希望+=运算符不执行重新分配。但是，它似乎每次添加两个矩阵时都会重新分配。

我应该认为这是朱莉娅的一个错误吗？更重要的是，我如何以不重新分配的方式编写此代码？

编辑：修正了错字。

Answer 1

@ cd98请求我的三嵌套循环解决方案，它解决了分配问题，但我认为它会落后于等效的矢量化版本。这是：

function mytest(x, b)
    d1, d2, d3 = size(x)
    y = zeros(eltype(x), d1, d2)
    for i in 1:d3
        for j in 1:d2
            for k in 1:d1
                y[k,j] += b[i] * x[k,j,i]
            end
        end
    end
    return y
end

x = rand(30,30,100000)
b = rand(100000)
@time mytest(x,b)
@time mytest(x,b)

输出：

elapsed time: 0.218220119 seconds (767172 bytes allocated)
elapsed time: 0.197181799 seconds (7400 bytes allocated)

Answer 2

不能解决（原始）分配问题，但是通过简单地将循环包装在@inbounds的后一个解决方案中，我获得了1.8倍的加速：

function mytest_inbounds(x, b)
    d1, d2, d3 = size(x)
    y = zeros(eltype(x), d1, d2)
    @inbounds begin
        for i in 1:d3
            for j in 1:d2
                for k in 1:d1
                    y[k,j] += b[i] * x[k,j,i]
                end
            end
        end
    end
    return y
end

x = rand(30, 30, 100000)
b = rand(100000)
@time mytest(x, b)
@time mytest(x, b)
@time mytest_inbounds(x, b)
@time mytest_inbounds(x, b)

输出：

elapsed time: 0.39144919 seconds (767212 bytes allocated)
elapsed time: 0.353495867 seconds (7400 bytes allocated)
elapsed time: 0.202614643 seconds (396972 bytes allocated)
elapsed time: 0.193425902 seconds (7400 bytes allocated)

此外，这里有很多关于相关问题的讨论：

https://groups.google.com/forum/#!msg/julia-users/aYS_AvKqPCI/DyTiq4lKIAoJ

Answer 3

还可以使用Base.Cartesian：在using Base.Cartesian之后，你可以写

@nloops 3 i x begin
  (@nref 2 y i) += b[i_3] * (@nref 3 x i)
end

扩展到与Jim的回答基本相同的循环。

使用Julia更新运算符进行不必要的分配

3 个答案: