Question

我正在尝试使用GPU代码（仍然作为初学者），但由于这是我的第二个任务，与我之前的任务相比，这有点不同。我想在下面的代码中做两件事。 1）将下面的代码转换为GPU代码 2）但更具体地说，我希望将其作为优化代码，以便更快地运行。功能如下：

RF =  [4,3]; 
overlap= 1;
inhibatory = 0;
gap = RF-overlap;
ULS = size(err_sen{lay_no}); % suppose 6x6
Curr_LS = size(s.images{lay_no-1}); % Lower or current for which we are caculating weights in this case 19x19
p_Grad = grad.wGrad{lay_no}; % existing 19x19 values which will be updated
cur_sen = err_sen{lay_no}(:,:,:);  % upper layer values
tempGrad = zeros(Curr_LS(1), Curr_LS(2), Curr_LS(3)); % creating a tempGrad matrix for saving the data
curr_Input = s.images{lay_no-1}(:,:,:,samples_ind); % input source which will be multiplied with the other sensitivities of upper layer (cur_sen) 
cur_maps = net.map_struct{lay_no-1}; % this specifies which input image was used for calculating high layer image. 

for Cur_lay_Map = 1: Curr_LS(3)  % in each sample we have 13, 11, 9 maps (which reduces as layer goes up) 
    map_to_read = find(cur_maps(Cur_lay_Map,:));  % its a mat in which it specifies which it used suppose in this case of 13x11 where in each column we have 3 consecutive 1's (1, 2,3) and rest 0's, and than in next column another (2,3,4) 1's and rest zeros
    tempgrads = zeros(Curr_LS(1), Curr_LS(2));

    for ii=1: Curr_LS(1)  % for lower layer image pixels reading e.g in this case 19x19
        for jj=1: Curr_LS(2)
            uLowMax=ceil((ii-(RF+inhibatory))/(gap-inhibatory)); % calculating which pixels to read in upper layer of 6x6
            uHighMax=floor((ii-1)/(gap-inhibatory))+1;
            vLowMax=ceil((jj-(RF+inhibatory))/(gap-inhibatory));
            vHighMax = floor((jj-1)/(gap-inhibatory))+1;

            uLow=ceil((ii-RF)/gap); 
            uHigh= floor((ii-1)/gap)+1;
            vLow=ceil((jj-RF)/gap); 
            vHigh= floor((jj-1)/gap)+1;
            summed_value=0;
            uLowMax = max(uLowMax,1);
            uHighMax = min(uHighMax, ULS(1));
            vLowMax = max(vLowMax,1);
            vHighMax = min(vHighMax, ULS(2));
            for Up_map_sens = map_to_read % this states which three maps to read from 11 in each case
                UL_Sen = cur_sen(:,:,Up_map_sens);
                if(inhibatory==0) % this if it is in receptive field 
                    summed_value = summed_value + sum(sum(UL_Sen(uLowMax:uHighMax,vLowMax:vHighMax)));
                else % it is in inhibitory field but as we have 0 so not used currently
                    for u = uLowMax : uHighMax
                        for v = vLowMax : vHighMax
                            if(u>=uLow && u<= uHigh && v>=vLow &&v<=vHigh)
                                 summed_value = summed_value + UL_Sen(u,v);
                            else
                                 summed_value = summed_value - UL_Sen(u,v);
                            end
                        end
                    end
                end
            end
            cur_lay_nValue = curr_Input(ii,jj,Cur_lay_Map);
            summed_value = summed_value * cur_lay_nValue;
            tempgrads(ii,jj) = summed_value;
        end
     end
     tempG_all(:,:,Cur_lay_Map) = tempgrads(:,:);
  end
  newGrad(:,:) = prevGrad + sum(tempG_all,3);
  grad.wGrad{lay_no}(:,:) =  newGrad(:,:);
  clear newGrad;
end

我将感谢您在这方面的指导和帮助。我自己尝试转换和优化，但直到现在我都没有成功。此致

Answer 1

这里有几点需要注意 -

GPU一般不喜欢条件语句，因为它们会引起分歧。所以，我们必须要在这里摆脱那些。
一次查看问题中的所有代码将是一件累人的工作。因此，我们必须采取小步骤进行优化。第一步是矢量化最里面的嵌套循环，这样就可以并行完成，这是GPU优先考虑的性能理念。

这是我们开始处理的代码 -

for Up_map_sens = map_to_read % this states which three maps ...
    UpperLayer_Sensitivity = cur_sensitivites(:,:,Up_map_sens);
    if(inhibatory==0) % this if it is in receptive field
        summed_value = summed_value + ...
                 sum(sum(UpperLayer_Sensitivity(uLowMax:uHighMax,vLowMax:vHighMax)));
    else % it is in inhibitory field but as we have 0 so not used currently
        for u = uLowMax : uHighMax
            for v = vLowMax : vHighMax
                if(u>=uLow && u<= uHigh && v>=vLow &&v<=vHigh)
                    summed_value = summed_value + UpperLayer_Sensitivity(u,v);
                else
                    summed_value = summed_value - UpperLayer_Sensitivity(u,v);
                end
            end
        end
    end
end

上述代码的矢量化版本可能是这样的 -

%// Get size
[m,n,p] = size(cur_sensitivites);

%// You basically have two subarrays, one is bigger and another smaller but
%// a subset of the bigger one. Get the sum of these two.

%// Get the linear indices for the bigger array and finally sum of all it
ind1 = bsxfun(@plus,[uLowMax:uHighMax]',([vLowMax:vHighMax]-1)*m); %//'
ind2 = bsxfun(@plus,ind1(:),(map_to_read-1)*m*n);
mv = sum(cur_sensitivites(ind2(:)));

%// Get the linear indices for the smaller subset array and finally sum it all
ind1 = bsxfun(@plus,[uLow:uHigh]',([vLow:vHigh]-1)*m); %//'
ind2 = bsxfun(@plus,ind1(:),(map_to_read-1)*m*n);
pv = sum(cur_sensitivites(ind2(:)));

%// Get conditional value and the final output - summed_value
cond1 = inhibatory==0;
summed_value = mv.*cond1 + (-mv+2*pv).*(~cond1);

因此，要使用GPU进行计算，您需要在开始时调用gpuArray(...)将数据复制到GPU。您可以保留标量。以此为开头，因为三个最里面的嵌套循环与可怕的条件语句一起关闭，你只剩下几个嵌套的循环。

如何以优化的方式将CPU代码转换为GPU？

1 个答案: