Question

在我的一个应用程序中，我必须在数组中存储不同子类型的元素，并且受到JIT性能的重创。以下是一个最小的例子。

abstract A
immutable B <: A end
immutable C <: A end

b = B()
c = C()
@time getindex(A, b, b)
@time getindex(A, b, c)
@time getindex(A, c, c)
@time getindex(A, c, b)
@time getindex(A, b, c, b)
@time getindex(A, b, c, c);

  0.007756 seconds (6.03 k allocations: 276.426 KB)
  0.007878 seconds (5.01 k allocations: 223.087 KB)
  0.005175 seconds (2.44 k allocations: 128.773 KB)
  0.004276 seconds (2.42 k allocations: 127.546 KB)
  0.004107 seconds (2.45 k allocations: 129.983 KB)
  0.004090 seconds (2.45 k allocations: 129.983 KB)

如您所见，每次为不同的元素组合构造数组时，都必须执行JIT。

我也尝试[...]而不是T[...]，看起来更糟。

重新启动内核并运行以下命令：

b = B()
c = C()
@time Base.vect(b, b)
@time Base.vect(b, c)
@time Base.vect(c, c)
@time Base.vect(c, b)
@time Base.vect(b, c, b)
@time Base.vect(b, c, c);

  0.008252 seconds (6.87 k allocations: 312.395 KB)
  0.149397 seconds (229.26 k allocations: 12.251 MB)
  0.006778 seconds (6.86 k allocations: 312.270 KB)
  0.113640 seconds (178.26 k allocations: 9.132 MB, 3.04% gc time)
  0.050561 seconds (99.19 k allocations: 5.194 MB)
  0.031053 seconds (72.50 k allocations: 3.661 MB)

在我的应用程序中，我面对很多不同的子类型：每个元素都是NTuple{N, A}类型，其中N可以更改。所以最终应用程序停留在JIT中。

最好的解决方法是什么？我能想到的唯一方法是创建一个包装器，比如说W，然后在进入数组之前将我的所有元素装入W。所以编译器只编译一次数组函数。

immutable W
    value::NTuple
end

感谢@Matt B.重载他的getindex，

c = C()
@time getindex(A, b, b)
@time getindex(A, b, c)
@time getindex(A, c, c)
@time getindex(A, c, b)
@time getindex(A, b, c, b)
@time getindex(A, b, c, c);

  0.008493 seconds (6.43 k allocations: 289.646 KB)
  0.000867 seconds (463 allocations: 19.012 KB)
  0.000005 seconds (5 allocations: 240 bytes)
  0.000003 seconds (5 allocations: 240 bytes)
  0.004035 seconds (2.37 k allocations: 122.535 KB)
  0.000003 seconds (5 allocations: 256 bytes)

另外，我意识到元组的JIT实际上非常有效。

@time tuple(1,2)
@time tuple(b, b)
@time tuple(b, c)
@time tuple(c, c)
@time tuple(c, b)
@time tuple(b, c, b)
@time tuple(b, c, c);
@time tuple(b, b)
@time tuple(b, c)
@time tuple(c, c)
@time tuple(c, b)
@time tuple(b, c, b)
@time tuple(b, c, c);

  0.000004 seconds (149 allocations: 10.183 KB)
  0.000011 seconds (7 allocations: 336 bytes)
  0.000008 seconds (7 allocations: 336 bytes)
  0.000007 seconds (7 allocations: 336 bytes)
  0.000007 seconds (7 allocations: 336 bytes)
  0.000005 seconds (7 allocations: 352 bytes)
  0.000004 seconds (7 allocations: 352 bytes)
  0.000003 seconds (5 allocations: 192 bytes)
  0.000004 seconds (5 allocations: 192 bytes)
  0.000002 seconds (5 allocations: 192 bytes)
  0.000002 seconds (5 allocations: 192 bytes)
  0.000002 seconds (5 allocations: 192 bytes)
  0.000002 seconds (5 allocations: 192 bytes)

Answer 1

这里的JIT启发式可能在基础库中得到更好的调整。虽然Julia默认为参数类型的唯一排列生成专门的方法，但是可以使用一些转义符号来减少特化的数量：

使用f(T::Type)代替f{T}(::Type{T})。两者都是良好的类型，并通过推理表现良好，但前者只会为所有类型生成一种方法。
使用未记录的全部大写g(::ANY)标记而不是g(::Any)。它在语义上是相同的，但ANY将阻止该参数的专门化。

在这种情况下，您可能希望专注于类型而不是值：

  function Base.getindex{T<:A}(::Type{T}, vals::ANY...)
       a = Array(T,length(vals))
       @inbounds for i = 1:length(vals)
           a[i] = vals[i]
       end
       return a
   end

抽象类型数组构造JIT性能

1 个答案: