下面的代码是正确的,但我想对其进行矢量化(并可能转换为GPU)以提高速度。
如何将其转换为矢量形式?
RF = 4;
inhibatory = 0;
overlap=3;
act_funct = 'sig';
gap = RF-overlap;
Image1 = rand(30,22);
Image2 = rand(27,19); % size_image2 is equal to 27x19
Image3 = rand(30,22);
de_act_output = de_activate_Mat(Image1,act_funct); % finding derivative of the matrix. e.g. de_act_output = act_output.*(1-act_output) in case of sigmoid.
for u=1:size(Image1,1)
for v=1:size(Image1,2)
sum_val=0;
iLowMax=max(ceil((u-(RF+inhibatory))/(gap-inhibatory)),1);
iHighMax=min(floor((u-1)/(gap-inhibatory))+1, size_image2(1));
jLowMax=max(ceil((v-(RF+inhibatory))/(gap-inhibatory)),1);
jHighMax = min(floor((v-1)/(gap-inhibatory))+1, size_image2(2));
sum_sens = sum(sum(Image2(iLowMax:iHighMax,jLowMax:jHighMax)));
sum_val = sum_sens(:,:) .* Image3(u,v);
result(u,v) = de_act_output(u,v) .* sum_val;
end
end
答案 0 :(得分:1)
您在嵌套循环内创建的parallelogram-like
块结构iLowMax:iHighMax,jLowMax:jHighMax
不会导致
任何简单的矢量化代码。但是如果性能对你的情况至关重要,那么你就可以全力以赴地进行矢量化,看起来convolution
在那里很有用。这里列出的是一些调整
通过预先计算大多数其他东西来加快这一步的速度,这必然会带来明显的加速。这是实施 -
U = 1:size(Image1,1); %// Create arrays of iteration steps
V = 1:size(Image1,2);
%// Calculate arrays of low-high row and column indices
iLowMax=max(ceil((U-(RF+inhibatory))/(gap-inhibatory)),1);
iHighMax=min(floor((U-1)/(gap-inhibatory))+1, size_image2(1));
jLowMax=max(ceil((V-(RF+inhibatory))/(gap-inhibatory)),1);
jHighMax = min(floor((V-1)/(gap-inhibatory))+1, size_image2(2));
sens_sums(size(Image1,1),size(Image1,2)) = 0; %// Pre-allocation
for u=1:size(Image1,1)
for v=1:size(Image1,2)
sens = Image2(iLowMax(u):iHighMax(u),jLowMax(v):jHighMax(v));
sens_sums(u,v) = sum(sens(:));
end
end
result = sens_sums.*Image3.*de_act_output;