在我的matlab程序中,我有几个实例需要创建一个矩阵,这些条目取决于它的索引并用它执行矩阵向量运算。我想知道如何最有效地实现这一目标。
例如,我需要加快速度:
N = 1e4;
x = rand(N,1);
% Option 1
tic
I = 1:N;
J = 1:N;
S = zeros(N,N);
for i = 1:N
for j = 1:N
S(i,j) = (i+j)/(abs(i-j)+1);
end
end
a = x'*S*x
fprintf('Option 1 takes %.4f sec\n',toc)
clearvars -except x N
我尝试加快速度,所以我尝试了以下选项:
% Option 2
tic
I = 1:N;
J = 1:N;
Sx = zeros(N,1);
for i = 1:N
Srow_i = (i+J)./(abs(i-J)+1);
Sx(i)= Srow_i*x;
end
a = x'*Sx
fprintf('Option 2 takes %.4f sec\n',toc)
clearvars -except x N
和
% Option 3
tic
I = 1:N;
J = 1:N;
S = bsxfun(@plus,I',J)./(abs(bsxfun(@minus,I',J))+1);
a = x'*S*x
fprintf('Option 3 takes %.4f sec\n',toc)
clearvars -except x N
和(感谢其中一个答案)
% options 4
tic
[I , J] = meshgrid(1:N,1:N);
S = (I+J) ./ (abs(I-J) + 1);
a = x' * S * x;
fprintf('Option 4 takes %.4f sec\n',toc)
clearvars -except x N
Otion 2是最有效的。是否有更快的选择来执行此操作?
更新
我也尝试过Abhinav的选项:
% Option 5 using Tony's Trick
tic
i = 1:N;
j = (1:N)';
I = i(ones(N,1),:);
J = j(:,ones(N,1));
S = (I+J)./(abs(I-J)+1);
a = x'*S*x;
fprintf('Option 5 takes %.4f sec\n',toc)
clearvars -except x N
似乎最有效的程序取决于N的大小。对于不同的N,我得到以下输出:
N = 100:
Option 1 takes 0.00233 sec
Option 2 takes 0.00276 sec
Option 3 takes 0.00183 sec
Option 4 takes 0.00145 sec
Option 5 takes 0.00185 sec
N = 10000:
Option 1 takes 3.29824 sec
Option 2 takes 0.41597 sec
Option 3 takes 0.72224 sec
Option 4 takes 1.23450 sec
Option 5 takes 1.27717 sec
因此,对于小N,选项2是最慢的,但对于较大的N,它变得最有效。可能是因为内存?有人可以解释一下吗?
答案 0 :(得分:2)
您可以使用meshgrid创建索引,无需循环:
N = 1e4;
[I , J] = meshgrid(1:N,1:N);
x = rand(N,1);
S = (I+J) ./ (abs(I-J) + 1);
a = x' * S * x;
<强>更新强>
由于@Optimist显示此代码的性能低于Option2和Option3,我决定略微改进Option2:
N = 1e4;
x = rand(N,1);
Sx = zeros(N,1);
for i = 1:N
Srow_i = (i+1:i+N)./[i:-1:2,1:N-i+1] ;
Sx(i)= Srow_i*x;
end
a = x'*Sx;
答案 1 :(得分:1)
You should try using the Tony's trick to do vector stacking/tiling in Matlab the fastest way. I have answered a similar question here. Here is the Tony's Trick
option.
% Option using Tony's Trick
tic
i = 1:N;
j = (1:N)';
I = i(ones(N,1),:);
J = j(:,ones(N,1));
S = (I+J)./(abs(I-J)+1);
a = x'*S*x
fprintf('Option 1 takes %.4f sec\n',toc)
Edit 1: I ran a few tests and found the following. Up to N=1000, the Tony's trick
option is slightly faster than the Option 2
. Beyond that, Option 2
again catches up and becomes faster.
Possible Reason :
This should be so because, up until the size of the array could fit in the cache, the fully vectorized code (Tony's Trick option
) is faster BUT as soon as the array sizes grow (N>1000), it spills into memory caches away from the CPU and then Matlab uses some internal optimization to breakdown the Tony's Trick
code into piecemeal code so that it no-longer enjoys the benefit of complete vectorization.