如何以优化的方式将CPU代码转换为GPU?

时间:2014-09-18 15:58:01

标签: matlab image-processing optimization gpu gpuimage

我正在尝试使用GPU代码(仍然作为初学者),但由于这是我的第二个任务,与我之前的任务相比,这有点不同。我想在下面的代码中做两件事。 1)将下面的代码转换为GPU代码 2)但更具体地说,我希望将其作为优化代码,以便更快地运行。 功能如下:

RF =  [4,3]; 
overlap= 1;
inhibatory = 0;
gap = RF-overlap;
ULS = size(err_sen{lay_no}); % suppose 6x6
Curr_LS = size(s.images{lay_no-1}); % Lower or current for which we are caculating weights in this case 19x19
p_Grad = grad.wGrad{lay_no}; % existing 19x19 values which will be updated
cur_sen = err_sen{lay_no}(:,:,:);  % upper layer values
tempGrad = zeros(Curr_LS(1), Curr_LS(2), Curr_LS(3)); % creating a tempGrad matrix for saving the data
curr_Input = s.images{lay_no-1}(:,:,:,samples_ind); % input source which will be multiplied with the other sensitivities of upper layer (cur_sen) 
cur_maps = net.map_struct{lay_no-1}; % this specifies which input image was used for calculating high layer image. 

for Cur_lay_Map = 1: Curr_LS(3)  % in each sample we have 13, 11, 9 maps (which reduces as layer goes up) 
    map_to_read = find(cur_maps(Cur_lay_Map,:));  % its a mat in which it specifies which it used suppose in this case of 13x11 where in each column we have 3 consecutive 1's (1, 2,3) and rest 0's, and than in next column another (2,3,4) 1's and rest zeros
    tempgrads = zeros(Curr_LS(1), Curr_LS(2));

    for ii=1: Curr_LS(1)  % for lower layer image pixels reading e.g in this case 19x19
        for jj=1: Curr_LS(2)
            uLowMax=ceil((ii-(RF+inhibatory))/(gap-inhibatory)); % calculating which pixels to read in upper layer of 6x6
            uHighMax=floor((ii-1)/(gap-inhibatory))+1;
            vLowMax=ceil((jj-(RF+inhibatory))/(gap-inhibatory));
            vHighMax = floor((jj-1)/(gap-inhibatory))+1;

            uLow=ceil((ii-RF)/gap); 
            uHigh= floor((ii-1)/gap)+1;
            vLow=ceil((jj-RF)/gap); 
            vHigh= floor((jj-1)/gap)+1;
            summed_value=0;
            uLowMax = max(uLowMax,1);
            uHighMax = min(uHighMax, ULS(1));
            vLowMax = max(vLowMax,1);
            vHighMax = min(vHighMax, ULS(2));
            for Up_map_sens = map_to_read % this states which three maps to read from 11 in each case
                UL_Sen = cur_sen(:,:,Up_map_sens);
                if(inhibatory==0) % this if it is in receptive field 
                    summed_value = summed_value + sum(sum(UL_Sen(uLowMax:uHighMax,vLowMax:vHighMax)));
                else % it is in inhibitory field but as we have 0 so not used currently
                    for u = uLowMax : uHighMax
                        for v = vLowMax : vHighMax
                            if(u>=uLow && u<= uHigh && v>=vLow &&v<=vHigh)
                                 summed_value = summed_value + UL_Sen(u,v);
                            else
                                 summed_value = summed_value - UL_Sen(u,v);
                            end
                        end
                    end
                end
            end
            cur_lay_nValue = curr_Input(ii,jj,Cur_lay_Map);
            summed_value = summed_value * cur_lay_nValue;
            tempgrads(ii,jj) = summed_value;
        end
     end
     tempG_all(:,:,Cur_lay_Map) = tempgrads(:,:);
  end
  newGrad(:,:) = prevGrad + sum(tempG_all,3);
  grad.wGrad{lay_no}(:,:) =  newGrad(:,:);
  clear newGrad;
end

我将感谢您在这方面的指导和帮助。我自己尝试转换和优化,但直到现在我都没有成功。此致

1 个答案:

答案 0 :(得分:3)

这里有几点需要注意 -

  1. GPU一般不喜欢条件语句,因为它们会引起分歧。 所以,我们必须要在这里摆脱那些。

  2. 一次查看问题中的所有代码将是一件累人的工作。因此,我们必须采取小步骤进行优化。第一步是矢量化最里面的嵌套循环,这样就可以并行完成,这是GPU优先考虑的性能理念。

  3. 这是我们开始处理的代码 -

    for Up_map_sens = map_to_read % this states which three maps ...
        UpperLayer_Sensitivity = cur_sensitivites(:,:,Up_map_sens);
        if(inhibatory==0) % this if it is in receptive field
            summed_value = summed_value + ...
                     sum(sum(UpperLayer_Sensitivity(uLowMax:uHighMax,vLowMax:vHighMax)));
        else % it is in inhibitory field but as we have 0 so not used currently
            for u = uLowMax : uHighMax
                for v = vLowMax : vHighMax
                    if(u>=uLow && u<= uHigh && v>=vLow &&v<=vHigh)
                        summed_value = summed_value + UpperLayer_Sensitivity(u,v);
                    else
                        summed_value = summed_value - UpperLayer_Sensitivity(u,v);
                    end
                end
            end
        end
    end
    

    上述代码的矢量化版本可能是这样的 -

    %// Get size
    [m,n,p] = size(cur_sensitivites);
    
    %// You basically have two subarrays, one is bigger and another smaller but
    %// a subset of the bigger one. Get the sum of these two.
    
    %// Get the linear indices for the bigger array and finally sum of all it
    ind1 = bsxfun(@plus,[uLowMax:uHighMax]',([vLowMax:vHighMax]-1)*m); %//'
    ind2 = bsxfun(@plus,ind1(:),(map_to_read-1)*m*n);
    mv = sum(cur_sensitivites(ind2(:)));
    
    %// Get the linear indices for the smaller subset array and finally sum it all
    ind1 = bsxfun(@plus,[uLow:uHigh]',([vLow:vHigh]-1)*m); %//'
    ind2 = bsxfun(@plus,ind1(:),(map_to_read-1)*m*n);
    pv = sum(cur_sensitivites(ind2(:)));
    
    %// Get conditional value and the final output - summed_value
    cond1 = inhibatory==0;
    summed_value = mv.*cond1 + (-mv+2*pv).*(~cond1);
    

    因此,要使用GPU进行计算,您需要在开始时调用gpuArray(...)将数据复制到GPU。您可以保留标量。以此为开头,因为三个最里面的嵌套循环与可怕的条件语句一起关闭,你只剩下几个嵌套的循环。