考虑以下功能:
function mytest(x, b)
y = zeros(x[:,:,1])
for i in 1:length(b)
y += b[i] * x[:,:,i]
end
return y
end
当我运行它时,我得到以下内容:
x = rand(30,30,100000)
b = rand(100000)
@time mytest(x,b)
elapsed time: 0.571765222 seconds (727837732 bytes allocated, 66.49% gc time)
为什么要分配这么多内存并花费这么多时间进行垃圾回收?代码应该是类型稳定的,我希望+=
运算符不执行重新分配。但是,它似乎每次添加两个矩阵时都会重新分配。
我应该认为这是朱莉娅的一个错误吗?更重要的是,我如何以不重新分配的方式编写此代码?
编辑:修正了错字。
答案 0 :(得分:4)
@ cd98请求我的三嵌套循环解决方案,它解决了分配问题,但我认为它会落后于等效的矢量化版本。这是:
function mytest(x, b)
d1, d2, d3 = size(x)
y = zeros(eltype(x), d1, d2)
for i in 1:d3
for j in 1:d2
for k in 1:d1
y[k,j] += b[i] * x[k,j,i]
end
end
end
return y
end
x = rand(30,30,100000)
b = rand(100000)
@time mytest(x,b)
@time mytest(x,b)
输出:
elapsed time: 0.218220119 seconds (767172 bytes allocated)
elapsed time: 0.197181799 seconds (7400 bytes allocated)
答案 1 :(得分:4)
不能解决(原始)分配问题,但是通过简单地将循环包装在@inbounds的后一个解决方案中,我获得了1.8倍的加速:
function mytest_inbounds(x, b)
d1, d2, d3 = size(x)
y = zeros(eltype(x), d1, d2)
@inbounds begin
for i in 1:d3
for j in 1:d2
for k in 1:d1
y[k,j] += b[i] * x[k,j,i]
end
end
end
end
return y
end
x = rand(30, 30, 100000)
b = rand(100000)
@time mytest(x, b)
@time mytest(x, b)
@time mytest_inbounds(x, b)
@time mytest_inbounds(x, b)
输出:
elapsed time: 0.39144919 seconds (767212 bytes allocated)
elapsed time: 0.353495867 seconds (7400 bytes allocated)
elapsed time: 0.202614643 seconds (396972 bytes allocated)
elapsed time: 0.193425902 seconds (7400 bytes allocated)
此外,这里有很多关于相关问题的讨论:
https://groups.google.com/forum/#!msg/julia-users/aYS_AvKqPCI/DyTiq4lKIAoJ
答案 2 :(得分:3)
还可以使用Base.Cartesian:在using Base.Cartesian
之后,你可以写
@nloops 3 i x begin
(@nref 2 y i) += b[i_3] * (@nref 3 x i)
end
扩展到与Jim的回答基本相同的循环。