如何改进Julia中嵌套数组的速度?

时间:2017-03-24 14:30:39

标签: arrays nested julia

以下函数nested_arrays(令人惊讶地)生成一个“深度”n的嵌套数组。但是,在使用n23等小值运行时,运行并显示输出需要相当长的时间。

julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)

julia> nested_arrays(1)
1-element Array{Int64,1}:
 1

julia> nested_arrays(2)
1-element Array{Array{Int64,1},1}:
 [1]

julia> nested_arrays(3)
1-element Array{Array{Array{Int64,1},1},1}:
 Array{Int64,1}[[1]]

julia> nested_arrays(10)
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}:
 Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]

有趣的是,当在行尾使用@time宏或;时,结果的计算时间相对较少。相反,REPL中结果的实际显示占用了大部分时间。

这种奇怪的行为没有在例如Python中显示。

In [1]: def nested_lists(n):
   ...:     if n == 1:
   ...:         return [1]
   ...:     return [nested_lists(n - 1)]
   ...: 

In [2]: nested_lists(10)
Out[2]: [[[[[[[[[[1]]]]]]]]]]

In [3]: %time nested_lists(100)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 37.7 µs
Out[3]: [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[1]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]

为什么朱莉娅这个功能这么慢? Julia是否重新编译display中不同类型T的{​​{1}}函数?如果是这样,为什么会这样?

可以提高此代码的速度,还是应该在Julia中完成?实际上我对此的主要关注点是,例如,加载一个复杂的嵌套JSON文件,其中只使用Array{T, 1} - 维数组是不可能的。

1 个答案:

答案 0 :(得分:5)

是的,这完全是由于编译时间。您可以通过@time display来查看此内容。第二次显示它很快:

julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)

julia> @time display(nested_arrays(15));
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1},1},1}:
 Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]]]]]]
 11.682721 seconds (8.83 M allocations: 371.698 MB, 1.82% gc time)

julia> @time display(nested_arrays(15));
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1},1},1}:
 Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]]]]]]
  0.001688 seconds (2.38 k allocations: 102.766 KB)

那为什么这么慢?这里的显示递归遍历所有数组,并将它们嵌套在彼此内部。这是递归调用show有14种不同的类型 - 一个有14个嵌套数组,然后是13个嵌套数组的元素,然后是12个元素......依此类推!每个show方法都可以独立编译。为特定元素类型编译专用方法是Julia如何生成非常高效代码的关键部分。这意味着它能够专门化每个元素上完成的每个操作,而无需任何运行时类型检查或调度。不幸的是,在这种情况下,它会妨碍它。

您可以使用Any[]数组解决此问题;在JSON文件的上下文中,这非常有意义,因为您不知道它是否包含字符串或数组或数字等。这要快得多,因为它只需要编译show方法对于一个Any[]数组,然后递归使用它。

# new session
julia> nested_arrays(n) = n == 1 ? Any[1] : Any[nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)

julia> @time display(nested_arrays(15));
1-element Array{Any,1}:
 Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]
  1.571632 seconds (767.12 k allocations: 32.472 MB, 1.04% gc time)

julia> @time display(nested_arrays(15));
1-element Array{Any,1}:
 Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]
  0.000606 seconds (839 allocations: 30.859 KB)

julia> @time display(nested_arrays(100));
1-element Array{Any,1}:
 Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
  0.002523 seconds (17.76 k allocations: 579.297 KB)