所以我试图在Julia中使用ArrayFire,我发现性能会随着时间的推移而逐渐降低:
using ArrayFire
srand(1)
function f()
r = AFArray(zeros(Float32, 100, 100000))
a = AFArray(rand(Float32, 100, 100000))
for d in 1:100:90000
r[:,d:d+99] = a[:,d:d+99] .* a[:,d:d+99]
end
nothing
end
function g()
r = zeros(Float32, 100, 100000)
a = ones(Float32, 100, 100000)
for d in 1:100:90000
r[:,d:d+99] = a[:,d:d+99] .* a[:,d:d+99]
end
nothing
end
for _ in 1:15
@time f()
end
如果运行此代码,您将看到每次迭代都变得越来越慢。我尝试在finalize
上调用r
并在a
内调用f()
以尝试将这些数组从GPU内存中删除,以防出现问题,但它没有做到任何东西。
这是输出:
0.810842 seconds (114.91 k allocations: 80.216 MB, 0.71% gc time)
0.283941 seconds (79.22 k allocations: 78.561 MB, 3.22% gc time)
0.267405 seconds (79.22 k allocations: 78.561 MB, 2.31% gc time)
0.332186 seconds (79.22 k allocations: 78.561 MB, 1.76% gc time)
0.405174 seconds (79.22 k allocations: 78.561 MB, 1.50% gc time)
0.433224 seconds (79.22 k allocations: 78.561 MB, 2.11% gc time)
0.501358 seconds (79.22 k allocations: 78.561 MB, 1.18% gc time)
0.572704 seconds (79.22 k allocations: 78.561 MB, 1.07% gc time)
0.650663 seconds (79.22 k allocations: 78.561 MB, 1.10% gc time)
0.794873 seconds (79.22 k allocations: 78.561 MB, 1.16% gc time)
0.838882 seconds (79.22 k allocations: 78.561 MB, 1.04% gc time)
1.281940 seconds (79.22 k allocations: 78.561 MB, 0.61% gc time)
1.200713 seconds (79.22 k allocations: 78.561 MB, 0.37% gc time)
1.268786 seconds (79.22 k allocations: 78.561 MB, 0.78% gc time)
1.396851 seconds (79.22 k allocations: 78.561 MB, 0.66% gc time)