Question

我创建了上一个问题（Julia allocates huge amount of memory for unknown reason）的最小工作示例，隔离了问题。这可以在REPL中直接测试。考虑一下代码：

function test1(n)
    s = zero(Float64)
    for i = 1:10^n
        s += sqrt(rand()^2 + rand()^2 + rand()^2)
    end
    return s
end

-

function test2(n)
    @parallel (+) for i = 1:10^n
        sqrt(rand()^2 + rand()^2 +rand()^2)
    end
end

-

function test3(n)
    function add(one, two, three)
        one + two + three
    end

    @parallel (+) for i = 1:10^n
        sqrt(add(rand()^2, rand()^2, rand()^2))
    end
end

然后，我测试代码：

@time test1(8);
@time test1(8);

@time test2(8);
@time test2(8);

@time test3(8);
@time test3(8);

这是输出：

elapsed time: 1.017241708 seconds (183868 bytes allocated)
elapsed time: 1.033503964 seconds (96 bytes allocated)

elapsed time: 1.214897591 seconds (3682220 bytes allocated)
elapsed time: 1.020521156 seconds (2104 bytes allocated)

elapsed time: 15.23876415 seconds (9600679268 bytes allocated, 26.69% gc time)
elapsed time: 15.418865707 seconds (9600002736 bytes allocated, 26.19% gc time)

有人可以解释一下：

为什么每个函数的第一次运行会分配如此多的内存？
为什么test2(8)中分配的内存高于test1(8)？他们做同样的事情。
最重要的是，test3(8)到底发生了什么？它正在分配大量的内存。

修改

Julia Version 0.3.1
Commit c03f413* (2014-09-21 21:30 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.3.0)
  CPU: Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

Answer 1

在每个函数的第一次运行中，分配是由于编译：请记住，julia的JIT编译器大部分都是用julia编写的，因此在编译过程中消耗的任何内存（主要是类型分析）都会被包含在内。编译完该函数后，此分配就会消失。

对我来说，test2和test3在第二次运行时都会分配大约50K字节（使用julia -p 2）。

最后，并行版本分配一些额外内存的原因与@parallel的工作方式有关。它基本上必须从你的函数中创建一个“thunk”并将其传递给其他进程。这个thunk不是预编译的，因为它可能依赖于你作为参数传入的变量。

在@parallel中调用函数会导致巨大的内存分配

1 个答案: