Question

这个例子中发生了什么？

查看完整矩阵，非对角矩阵的查找操作更快。相反，在对角矩阵上获得稀疏表示更快（这似乎是合理的）。稀疏矩阵上的查找操作几乎相等。

为了好奇，有人能告诉我这里发生了什么？为什么在完整矩阵上找到非零元素比在对角矩阵上找到它们更快？

printf("Diagonal Mat:\n\n")
A = eye(10000);

printf("Full mat: ")
tic
find(A);
toc

printf("Building sparse representation: ")
tic
As = sparse(A);
toc

printf("Sparse mat: ")
tic
find(As);
toc

printf("\n\nNon-Diagonally flagged Mat:\n\n")
A = A | A; # This removes the "Diagonal Matrix" flag from A

printf("Full mat: ")
tic
find(A);
toc

printf("Building sparse representation: ")
tic
As = sparse(A);
toc

printf("Sparse mat: ")
tic
find(As);
toc

printf("\n\nActually Non-Diagonal Mat:\n\n")
A(:,:) = 0;
A(:,1) = 1;
printf("Full mat: ")
tic
find(A);
toc

printf("Building sparse representation: ")
tic
As = sparse(A);
toc

printf("Sparse mat: ")
tic
find(As);
toc

输出：

Diagonal Mat:

Full mat: Elapsed time is 0.204636 seconds.
Building sparse representation: Elapsed time is 5.19753e-05 seconds.
Sparse mat: Elapsed time is 7.60555e-05 seconds.


Non-Diagonally flagged Mat:

Full mat: Elapsed time is 0.0800331 seconds.
Building sparse representation: Elapsed time is 0.0924602 seconds.
Sparse mat: Elapsed time is 7.48634e-05 seconds.


Actually Non-Diagonal Mat:

Full mat: Elapsed time is 0.0799708 seconds.
Building sparse representation: Elapsed time is 0.092248 seconds.
Sparse mat: Elapsed time is 7.70092e-05 seconds.

Answer 1

首先，以下是衡量这一点的更好方法：

for i = 1:10, find (d); endfor
t = cputime ();
for i = 1:100, find (d); endfor
cputime () -t


for i = 1:10, find (f); endfor
t = cputime ();
for i = 1:100, find (f); endfor
cputime () -t

这是一个很好的问题。 Octave具有对角矩阵的内部特化，其中仅存储对角线值。你可以看到它使用的内存少了多少：

octave> d = eye (10000);
octave> f = full (eye (10000));
octave> typeinfo (d)
ans = diagonal matrix
octave> typeinfo (f)
ans = matrix
octave> whos d f
Variables in the current scope:

   Attr Name        Size                     Bytes  Class
   ==== ====        ====                     =====  ===== 
        d       10000x10000                  80000  double
        f       10000x10000              800000000  double

Total is 200000000 elements using 800080000 bytes

专业化是为了减少对角矩阵常见的情况下的内存和性能。这种专业化的问题在于它们会在整个地方添加特殊情况，特别是当您想要直接访问Octave经常执行的数据时。

在find的情况下，对于布尔数组，整数数组，置换矩阵和稀疏矩阵，它有special cases。对角矩阵没有特殊处理，因此使用real type double precision array的情况。这意味着无论如何，在调用find时，对角矩阵会在内部转换为完整数组。

奇怪的是，在调用full之前调用对角矩阵上的find似乎仍然更有效率，所以也许我的推理是错误的。我打开了performance bug report

Answer 2

它与您的计算机（或详细，您的cpu）如何处理和缓冲堆栈和堆上的值有关。在堆上为数组分配内存时，它会一个接一个地分配值的“列表”。因此，当你迭代一个数组，逐个值时，cpu将从该列表上的一个值跳转到下一个（非常快）但如果你从值i跳到值i + n，其中n不是1 ，这意味着cpu必须在该列表的其他位置找到该下一个值。因此，在编程环境中保存值的方式以及迭代这些值的方式会影响后续过程的速度。

这只是一个简短而简单的尝试来解释这个主题。实际上，它更复杂，因为更多的因素和不同的CPU和内存技术是可能的。如果你对这类事情感兴趣，我建议从这里开始：https://en.wikipedia.org/wiki/System_programming（要在主题上有一个自上而下的视图）。

在（非）对角矩阵中找到非零元素的速度

2 个答案: