我想知道是否有办法在此函数中对嵌套for循环进行矢量化,这填补了2D动态编程表DP的条目。我相信至少内部循环可以被矢量化,因为每行仅取决于前一行。我不知道怎么做。请注意,此函数在大型2D数组(图像)上调用,因此嵌套的for循环确实不会将其剪切掉。
function [cols] = compute_seam(energy)
[r, c, ~] = size(energy);
cols = zeros(r);
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
[~, j] = min(DP(r, :));
j = j - 1;
for i = r : -1 : 1
cols(i) = j;
j = BP(i, j);
end
end
答案 0 :(得分:4)
你认为至少内环是可矢量化的是正确的。这是嵌套循环部分的修改代码 -
rows_DP = size(DP,1); %// rows in DP
%// Get first row linear indices for a group of neighboring three columns,
%// which would be incremented as we move between rows with the row iterator
start_ind1 = bsxfun(@plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
基准代码 -
N = 3000; %// Datasize
energy = rand(N);
[r, c, ~] = size(energy);
disp('------------------------------------- With Original Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
toc,clear DP BP x l
disp('------------------------------------- With Vectorized Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
rows_DP = size(DP,1); %// rows in DP
start_ind1 = bsxfun(@plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
toc
结果 -
------------------------------------- With Original Code
Elapsed time is 44.200746 seconds.
------------------------------------- With Vectorized Code
Elapsed time is 1.694288 seconds.
因此,通过少量矢量化调整,您可能会在性能方面获得良好的26x speedup
。
为了提高性能,可以在代码中尝试更多优化调整 -
cols = zeros(r)
可以替换为col(r,r) = 0
。
DP = padarray(energy, [0, 1], Inf)
可以替换为
DP(1:size(energy,1),1:size(energy,2)+2)=Inf;
DP(:,2:end-1) = energy;
BP = zeros(r, c)
可以替换为BP(r, c) = 0
。
此处使用的预分配调整受this blog post的启发。