Question

问题：我想在不触发内存分配的情况下索引到数组，尤其是在将索引元素传递给函数时。通过阅读Julia文档，我怀疑答案围绕着使用sub函数，但不能完全看出......

工作示例：我构建了一个Float64（x）的大向量，然后是x中每个观察的索引。

N = 10000000
x = randn(N)
inds = [1:N]

现在我将mean函数计时到x和x[inds]（我首先运行mean(randn(2))以避免编程中的任何编译器异常）：

@time mean(x)
@time mean(x[inds])

这是一个相同的计算，但正如预期的那样，时间的结果是：

elapsed time: 0.007029772 seconds (96 bytes allocated)
elapsed time: 0.067880112 seconds (80000208 bytes allocated, 35.38% gc time)

那么，对于inds的任意选择（以及任意选择的数组和函数），是否存在解决内存分配问题的方法？

Answer 1

只需使用xs = sub(x, 1:N)即可。请注意，这与x = sub(x, [1:N])不同;在julia 0.3上，后者将失败，而在julia 0.4-pre上，后者将比前者慢得多。在julia 0.4-pre上，sub(x, 1:N)与view一样快：

julia> N = 10000000;

julia> x = randn(N);

julia> xs = sub(x, 1:N);

julia> using ArrayViews

julia> xv = view(x, 1:N);

julia> mean(x)
-0.0002491126429772525

julia> mean(xs)
-0.0002491126429772525

julia> mean(xv)
-0.0002491126429772525

julia> @time mean(x);
elapsed time: 0.015345806 seconds (27 kB allocated)

julia> @time mean(xs);
elapsed time: 0.013815785 seconds (96 bytes allocated)

julia> @time mean(xv);
elapsed time: 0.015871052 seconds (96 bytes allocated)

sub(x, inds)慢于sub(x, 1:N)的原因有几个：

每次访问xs[i]都对应x[inds[i]];我们必须查找两个内存位置而不是一个
如果inds未按顺序排列，则在访问x
它破坏了使用SIMD矢量化的能力

在这种情况下，后者可能是最重要的影响。这不是朱莉娅的限制;如果您在C，Fortran或汇编中编写等效代码，也会发生同样的事情。

请注意，sum(sub(x, inds))比sum(x[inds])更快，（直到后者成为前者，它应该在julia 0.4正式退出时）。但是如果你必须使用xs = sub(x, inds)进行许多操作，在某些情况下，即使它分配了内存，也可以值得你做一个副本，这样你就可以利用存储值时可能的优化。连续的记忆。

Answer 2

编辑：阅读tholy的答案也是为了全面了解情况！

当使用一系列指数时，现在Julia 0.4-pre（2015年2月初）的情况并不好：

julia> N = 10000000;
julia> x = randn(N);
julia> inds = [1:N];
julia> @time mean(x)
elapsed time: 0.010702729 seconds (96 bytes allocated)
elapsed time: 0.012167155 seconds (96 bytes allocated)
julia> @time mean(x[inds])
elapsed time: 0.088312275 seconds (76 MB allocated, 17.87% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.073672734 seconds (76 MB allocated, 3.27% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.071646757 seconds (76 MB allocated, 1.08% gc time in 1 pauses with 0 full sweep)
julia> xs = sub(x,inds);  # Only works on 0.4
julia> @time mean(xs)
elapsed time: 0.057446177 seconds (96 bytes allocated)
elapsed time: 0.096983673 seconds (96 bytes allocated)
elapsed time: 0.096711312 seconds (96 bytes allocated)
julia> using ArrayViews
julia> xv = view(x, 1:N)  # Note use of a range, not [1:N]!
julia> @time mean(xv)
elapsed time: 0.012919509 seconds (96 bytes allocated)
elapsed time: 0.013010655 seconds (96 bytes allocated)
elapsed time: 0.01288134 seconds (96 bytes allocated)
julia> xs = sub(x,1:N)  # Works on 0.3 and 0.4
julia> @time mean(xs)
elapsed time: 0.014191482 seconds (96 bytes allocated)
elapsed time: 0.014023089 seconds (96 bytes allocated)
elapsed time: 0.01257188 seconds (96 bytes allocated)

因此，虽然我们可以避免内存分配，但实际上我们还是更慢（！）。
问题是数组索引，而不是范围。你不能在0.3上使用sub，但你可以在0.4。
如果我们可以按范围进行索引，那么我们可以在0.3上使用ArrayViews.jl，在0.4上使用内置。这种情况与原始mean一样好。

我注意到使用较少数量的索引（而不是整个范围），差距要小得多，内存分配也很低，因此sub可能值得：

N = 100000000
x = randn(N)
inds = [1:div(N,10)]

@time mean(x)
@time mean(x)
@time mean(x)
@time mean(x[inds])
@time mean(x[inds])
@time mean(x[inds])
xi = sub(x,inds)
@time mean(xi)
@time mean(xi)
@time mean(xi)

给出

elapsed time: 0.092831612 seconds (985 kB allocated)
elapsed time: 0.067694917 seconds (96 bytes allocated)
elapsed time: 0.066209038 seconds (96 bytes allocated)
elapsed time: 0.066816927 seconds (76 MB allocated, 20.62% gc time in 1 pauses with 1 full sweep)
elapsed time: 0.057211528 seconds (76 MB allocated, 19.57% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.046782848 seconds (76 MB allocated, 1.81% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.186084807 seconds (4 MB allocated)
elapsed time: 0.057476269 seconds (96 bytes allocated)
elapsed time: 0.05733602 seconds (96 bytes allocated)

在Julia中索引数组时避免内存分配

2 个答案: