如何在Octave中向量化此函数的循环?

时间:2018-01-20 18:36:54

标签: matlab loops parallel-processing vectorization octave

我希望能够对此函数的for循环进行向量化,然后才能以八度为单位对其进行并行化。这些for循环可以被矢量化吗?非常感谢你提前!

我在每个for循环和if-else的开头和结尾添加了注释函数的代码。

function [par]=pem_v(tsm,pr)

% tsm and pr are arrays of N by n. % par is an array of N by 8

tss=[27:0.5:32];
tc=[20:0.01:29];
N=size(tsm,1);
% main-loop
for ii=1:N
    % I extract the rows in each loop because each one represents a sample
    sst=tsm(ii,:); sst=sst'; %then I convert each sample to column vectors
    pre=pr(ii,:); pre=pre';
    % main-condition
    if isnan(nanmean(sst))==1;
        par(ii,1:8)=NaN;
    else
        % first sub-loop
        for k=1:length(tss);
            idxx=find(sst>=tss(k)-0.25 & sst<=tss(k)+0.25);
            out(k)=prctile(pre(idxx),90);
        end
        % end first sub-loop
        tp90=tss(find(max(out)==out));
        % second sub-loop
        for j=1:length(tc)
            cond1=find(sst>=tc(j) & sst<=tp90);
            cond2=find(sst>=tp90);
            pem=zeros(length(sst),1);
            A=[sst(cond1),ones(length(cond1),1)];
            B=regress(pre(cond1),A);
            pt90=B(1)*(tp90-tc(j));
            AA=[(sst(cond2)-tp90)];
            BB=regress(pre(cond2)-pt90,AA);
            pem(cond1)=max(0,B(1)*(sst(cond1)-tc(j))); 
            pem(cond2)=max(0,(BB(1)*(sst(cond2)-tp90))+pt90); 
            clear A B AA BB;
            E(j)=sqrt(nansum((pem-pre).^2)/length(pre));
            clear pem;
        end
        % end second sub-loop
        tcc=tc(find(E==min(E)));
        % sub-condition
        if(isempty(tcc)==1);
            par(ii,1:9)=NaN;
        else
            cond1=find(sst>=tcc & sst<=tp90);
            cond2=find(sst>=tp90);
            pem=zeros(length(sst),1);
            A=[sst(cond1),ones(length(cond1),1)];
            B=regress(pre(cond1),A);
            pt90=B(1)*(tp90-tcc);
            AA=[sst(cond2)-tp90];
            BB=regress(pre(cond2)-pt90,AA);
            pem(cond1)=max(0,B(1)*(sst(cond1)-tcc)); 
            pem(cond2)=max(0,(BB(1)*(sst(cond2)-tp90))+pt90);    
            RMSE=sqrt(nansum((pem-pre).^2)/length(pre));
            % outputs
            par(ii,1)=tcc;
            par(ii,2)=tp90;
            par(ii,3)=B(1);
            par(ii,4)=BB(1);
            par(ii,5)=RMSE;
            par(ii,6)=nanmean(sst);
            par(ii,7)=nanmean(pre);
            par(ii,8)=nanmean(pem);
        end
        % end sub-condition
        clear pem pre sst RMSE BB B tp90 tcc
    end
    % end main-condition
end
% end main-loop

1 个答案:

答案 0 :(得分:3)

您还没有提供任何示例输入,因此我创建了一些类似的内容:

N = 5; n = 800; 
tsm = rand(N,n)*5+27; pr = rand(N,n);

然后,在你考虑对代码进行矢量化之前,你应该记住4件事......

  1. 避免在每个循环中计算相同的东西(比如矢量的大小),而是在循环之前执行此操作
  2. 尽可能预先分配数组(将它们声明为零/ NaN等)
  3. 不要使用find将逻辑索引转换为线性索引,没有必要和it will slow down your code
  4. 不要反复使用clear,特别是在循环中多次使用function [par]=pem_v(tsm,pr) % tsm and pr are arrays of N by n. % par is an array of N by 8 tss=[27:0.5:32]; tc=[20:0.01:29]; N=size(tsm,1); % Transpose once here instead of every loop tsm = tsm'; pr = pr'; % Pre-allocate memory for output 'par' par = NaN(N, 8); % Don't compute these every loop, do it before the loop. % numel simpler than length for vectors, and size is clearer still ntss = numel(tss); nsst = size(tsm,1); ntc = numel(tc); npr = size(pr, 1); for ii=1:N % Extract the columns in each loop because each one represents a sample sst=tsm(:,ii); pre=pr(:,ii); % main-condition. Previously isnan(nanmean(sst))==1, but that's only true if all(isnan(sst)) % We don't need to assign par(ii,1:8)=NaN since we initialised par to a matrix of NaNs if ~all(isnan(sst)); % first sub-loop, initialise 'out' first out = zeros(1, ntss); for k=1:ntss; % Don't use FIND on an indexing vector. Use the logical index raw, it's quicker idxx = (sst>=tss(k)-0.25 & sst<=tss(k)+0.25); % We need a check that some values of idxx are true, otherwise prctile will error. if nnz(idxx) > 0 out(k) = prctile(pre(idxx), 90); end end % Again, no need for FIND, just reduces speed. This is a theme... tp90=tss(max(out)==out); for jj=1:ntc cond1 = (sst>=tc(jj) & sst<=tp90); cond2 = (sst>=tp90); % Use nnz (numer of non-zero) instead of length, since cond1 is now a logical vector of all elements A = [sst(cond1),ones(nnz(cond1),1)]; B = regress(pre(cond1), A); pt90 = B(1)*(tp90-tc(jj)); AA = [(sst(cond2)-tp90)]; BB = regress(pre(cond2)-pt90,AA); pem=zeros(nsst,1); pem(cond1) = max(0, B(1)*(sst(cond1)-tc(jj))); pem(cond2) = max(0, (BB(1)*(sst(cond2)-tp90))+pt90); E(jj) = sqrt(nansum((pem-pre).^2)/npr); end tcc = tc(E==min(E)); if ~isempty(tcc); cond1 = (sst>=tcc & sst<=tp90); cond2 = (sst>=tp90); A = [sst(cond1),ones(nnz(cond1),1)]; B = regress(pre(cond1),A); pt90 = B(1)*(tp90-tcc); AA = [sst(cond2)-tp90]; BB = regress(pre(cond2)-pt90,AA); pem = zeros(length(sst),1); pem(cond1) = max(0, B(1)*(sst(cond1)-tcc)); pem(cond2) = max(0, (BB(1)*(sst(cond2)-tp90))+pt90); RMSE = sqrt(nansum((pem-pre).^2)/npr); % Outputs, which we might as well assign all at once! par(ii,:)=[tcc, tp90, B(1), BB(1), RMSE, ... nanmean(sst), nanmean(pre), nanmean(pem)]; end end end 。这很慢!相反,使用预分配来确保变量与您期望的每个循环一样。
  5. 使用上述随机输入,并考虑到这4项内容,以下代码比代码快〜65%。注意:这甚至没有做任何矢量化!

    [...]
                            <group>
                                <field name="code"/>
                                <field name="return_picking_type_id"/>
                                <field name="barcode_nomenclature_id" groups="base.group_no_one"/>
                            </group>
    [...]