我正在尝试使用GPU代码(仍然作为初学者),但由于这是我的第二个任务,与我之前的任务相比,这有点不同。我想在下面的代码中做两件事。 1)将下面的代码转换为GPU代码 2)但更具体地说,我希望将其作为优化代码,以便更快地运行。 功能如下:
RF = [4,3];
overlap= 1;
inhibatory = 0;
gap = RF-overlap;
ULS = size(err_sen{lay_no}); % suppose 6x6
Curr_LS = size(s.images{lay_no-1}); % Lower or current for which we are caculating weights in this case 19x19
p_Grad = grad.wGrad{lay_no}; % existing 19x19 values which will be updated
cur_sen = err_sen{lay_no}(:,:,:); % upper layer values
tempGrad = zeros(Curr_LS(1), Curr_LS(2), Curr_LS(3)); % creating a tempGrad matrix for saving the data
curr_Input = s.images{lay_no-1}(:,:,:,samples_ind); % input source which will be multiplied with the other sensitivities of upper layer (cur_sen)
cur_maps = net.map_struct{lay_no-1}; % this specifies which input image was used for calculating high layer image.
for Cur_lay_Map = 1: Curr_LS(3) % in each sample we have 13, 11, 9 maps (which reduces as layer goes up)
map_to_read = find(cur_maps(Cur_lay_Map,:)); % its a mat in which it specifies which it used suppose in this case of 13x11 where in each column we have 3 consecutive 1's (1, 2,3) and rest 0's, and than in next column another (2,3,4) 1's and rest zeros
tempgrads = zeros(Curr_LS(1), Curr_LS(2));
for ii=1: Curr_LS(1) % for lower layer image pixels reading e.g in this case 19x19
for jj=1: Curr_LS(2)
uLowMax=ceil((ii-(RF+inhibatory))/(gap-inhibatory)); % calculating which pixels to read in upper layer of 6x6
uHighMax=floor((ii-1)/(gap-inhibatory))+1;
vLowMax=ceil((jj-(RF+inhibatory))/(gap-inhibatory));
vHighMax = floor((jj-1)/(gap-inhibatory))+1;
uLow=ceil((ii-RF)/gap);
uHigh= floor((ii-1)/gap)+1;
vLow=ceil((jj-RF)/gap);
vHigh= floor((jj-1)/gap)+1;
summed_value=0;
uLowMax = max(uLowMax,1);
uHighMax = min(uHighMax, ULS(1));
vLowMax = max(vLowMax,1);
vHighMax = min(vHighMax, ULS(2));
for Up_map_sens = map_to_read % this states which three maps to read from 11 in each case
UL_Sen = cur_sen(:,:,Up_map_sens);
if(inhibatory==0) % this if it is in receptive field
summed_value = summed_value + sum(sum(UL_Sen(uLowMax:uHighMax,vLowMax:vHighMax)));
else % it is in inhibitory field but as we have 0 so not used currently
for u = uLowMax : uHighMax
for v = vLowMax : vHighMax
if(u>=uLow && u<= uHigh && v>=vLow &&v<=vHigh)
summed_value = summed_value + UL_Sen(u,v);
else
summed_value = summed_value - UL_Sen(u,v);
end
end
end
end
end
cur_lay_nValue = curr_Input(ii,jj,Cur_lay_Map);
summed_value = summed_value * cur_lay_nValue;
tempgrads(ii,jj) = summed_value;
end
end
tempG_all(:,:,Cur_lay_Map) = tempgrads(:,:);
end
newGrad(:,:) = prevGrad + sum(tempG_all,3);
grad.wGrad{lay_no}(:,:) = newGrad(:,:);
clear newGrad;
end
我将感谢您在这方面的指导和帮助。我自己尝试转换和优化,但直到现在我都没有成功。此致
答案 0 :(得分:3)
这里有几点需要注意 -
GPU一般不喜欢条件语句,因为它们会引起分歧。 所以,我们必须要在这里摆脱那些。
一次查看问题中的所有代码将是一件累人的工作。因此,我们必须采取小步骤进行优化。第一步是矢量化最里面的嵌套循环,这样就可以并行完成,这是GPU优先考虑的性能理念。
这是我们开始处理的代码 -
for Up_map_sens = map_to_read % this states which three maps ...
UpperLayer_Sensitivity = cur_sensitivites(:,:,Up_map_sens);
if(inhibatory==0) % this if it is in receptive field
summed_value = summed_value + ...
sum(sum(UpperLayer_Sensitivity(uLowMax:uHighMax,vLowMax:vHighMax)));
else % it is in inhibitory field but as we have 0 so not used currently
for u = uLowMax : uHighMax
for v = vLowMax : vHighMax
if(u>=uLow && u<= uHigh && v>=vLow &&v<=vHigh)
summed_value = summed_value + UpperLayer_Sensitivity(u,v);
else
summed_value = summed_value - UpperLayer_Sensitivity(u,v);
end
end
end
end
end
上述代码的矢量化版本可能是这样的 -
%// Get size
[m,n,p] = size(cur_sensitivites);
%// You basically have two subarrays, one is bigger and another smaller but
%// a subset of the bigger one. Get the sum of these two.
%// Get the linear indices for the bigger array and finally sum of all it
ind1 = bsxfun(@plus,[uLowMax:uHighMax]',([vLowMax:vHighMax]-1)*m); %//'
ind2 = bsxfun(@plus,ind1(:),(map_to_read-1)*m*n);
mv = sum(cur_sensitivites(ind2(:)));
%// Get the linear indices for the smaller subset array and finally sum it all
ind1 = bsxfun(@plus,[uLow:uHigh]',([vLow:vHigh]-1)*m); %//'
ind2 = bsxfun(@plus,ind1(:),(map_to_read-1)*m*n);
pv = sum(cur_sensitivites(ind2(:)));
%// Get conditional value and the final output - summed_value
cond1 = inhibatory==0;
summed_value = mv.*cond1 + (-mv+2*pv).*(~cond1);
因此,要使用GPU进行计算,您需要在开始时调用gpuArray(...)
将数据复制到GPU。您可以保留标量。以此为开头,因为三个最里面的嵌套循环与可怕的条件语句一起关闭,你只剩下几个嵌套的循环。