如何更快地制作以下循环代码?

时间:2018-04-11 15:47:50

标签: matlab performance loops optimization vectorization

我有以下代码来计算由空间中的给定距离(空间滞后)和时间分开的散乱数据点之间的半方差(不相关)(需要<1,即,<1天)。我需要计算每个空间滞后的单个数据点和所有其他数据点之间的半方差。

然而,它非常慢,如果可能的话我需要在循环外执行此操作。第二个循环相对较快,它是第一个循环,必须逐个遍历每个元素,这很慢。关于如何改进此代码以使其更快的任何想法?我使用的数据点数组比本例中使用的数组大90倍。

我正在使用MATLAB R2016a。

工作代码:

N = 1E2; % example problem size
D = 1:N; % distance 
t = rand(1,N)+0.2; % time
datapoints = rand(1,N)+20; % temperature

% obtain differences between each element
distance_diff = abs(bsxfun(@minus, D.', D));
time_diff = abs(bsxfun(@minus, t.', t));

D_lag = [5 10 15 20 25]; % spatial lag, need to perform calculation for each lag

% the first loop is the one that slows down my function
for nn = 1:length(datapoints) % for each element
    for lag_n = 1:length(D_lag) % for each distance lag 
        sl = D_lag(lag_n);
        d = distance_diff(nn,:) < sl & time_diff(nn,:) < 1; % select points within spatial lag and 1 day
        strfile(nn,lag_n).variable = datapoints(d); % save selected data points

        % calculate semi-variogram for each nn element and each lag_n
        a = datapoints(nn)-strfile(nn,lag_n).variable; % for each element in data points
        strfile(nn,lag_n).semi_var = 0.5*((nanmean(a)^0.5)^4)/ ...
    (0.457+(0.494/length(strfile(nn,lag_n).variable)));
    end
end

1 个答案:

答案 0 :(得分:3)

可以在保持代码的当前循环形式的同时改进代码,并且似乎可以完全摆脱循环。由于需要更深入地了解您的问题和数据结构,我此时不会显示矢量化版本(希望将来某个时候添加它),但会尝试解释如何到达那里。

我将在此答案中使用implicit expansion syntax(所以没有bsxfun)(在R2016b起作用)。

以下是我立即注意到的可以在您的解决方案中进行改进的内容:

  1. MATLAB警告内循环中增长strfile。避免它的最简单方法是反转循环顺序,然后:

    for nn = 1:length(datapoints) % for each element
      for lag_n = 1:length(D_lag) % for each distance lag    
    

    变成了这个:

    for nn = numel(datapoints):-1:1 % for each element
      for lag_n = numel(D_lag):-1:1 % for each distance lag    
    

    (我使用numel函数,因为more expressive而不是length

  2. 由于您只检查时差是否小于单个值(1),因此您绝对不需要在内部循环中执行此操作。事实上,你甚至可以完全将它移到循环之外,而不是:

    for nn = ...
      for lag_n = ...
        sl = D_lag(lag_n);    
        d = distance_diff(nn,:) < sl & time_diff(nn,:) < 1;
    

    你可以这样做:

    td = time_diff < 1; % or even directly: td = abs(t.'-t) < 1
    for nn = ...
      ttd = td(nn,:); % doesn't change each lag_n, no need to recompute
      for lag_n = ...
        d = distance_diff(nn,:) < D_lag(lag_n) & ttd;
    

    以上是向量化的第一步。在矢量化中,您希望使用“SIMD”方法,即发出一次命令,让它尽可能地处理大量数据,最好是同时处理整个数据集。

  3. 以上两项更改是仔细考虑内存分配和数据依赖性的结果,并且运行时间缩短了约50%。

    现在,正如所承诺的那样,很少有关于矢量化的指示:

    • 如果我们将D_lag转换为沿3 rd 维度的向量,我们可以一次性计算d的所有必需值:

      d = distance_diff < permute(D_lag,[1,3,2]) & time_diff < 1; % ...or simply
      d = abs(D.'-D) < permute(D_lag,[1,3,2]) & abs(t.'-t) < 1;
      
    • 您可以使用length
    • 而不是size(strfile(...),#)
    • 而不是nanmean(x),您将使用this Q&A中讨论的多参数语法。
    • 我认为在所有计算完成后使用以下方法创建struct输出是有意义的:

      strfile = struct('variable', (...), 'semi_var',(...) ); 
      

    这是我的代码,以防任何人想要基准测试或构建它:

    function q49779404
    N = 1E4; % example problem size
    D = 1:N; % distance
    t = rand(1,N)+0.2; % time
    datapoints = rand(1,N)+20; % temperature
    
    % obtain differences between each element
    distance_diff = abs(D.'-D); % implicit expansion
    time_diff = abs(t.'-t); % implicit expansion
    
    D_lag = [5 10 15 20 25]; % spatial lag, need to perform calculation for each lag
    out{1} = method1(datapoints, D_lag, distance_diff, time_diff);
    out{2} = method2(datapoints, D_lag, distance_diff, time_diff);
    % out{3} = method3(datapoints, D_lag, distance_diff, time_diff);
    
    function strfile = method1(datapoints, D_lag, distance_diff, time_diff)
    for nn = 1:length(datapoints) % for each element
      for lag_n = 1:length(D_lag) % for each distance lag    
        sl = D_lag(lag_n);    
        d = distance_diff(nn,:) < sl & time_diff(nn,:) < 1; % select points within spatial lag and 1 day        
        strfile(nn,lag_n).variable = datapoints(d); % save selected data points
    
        % calculate semi-variogram for each nn element and each lag_n
        a = datapoints(nn)-strfile(nn,lag_n).variable; % for each element in data points
        strfile(nn,lag_n).semi_var = 0.5*((nanmean(a)^0.5)^4) / ...
          (0.457+(0.494/length(strfile(nn,lag_n).variable)));    
      end
    end
    
    function strfile = method2(datapoints, D_lag, distance_diff, time_diff)
    td = time_diff < 1; 
    for nn = numel(datapoints):-1:1 % for each element
      ttd = td(nn,:); % doesn't change each lag_n, no need to recompute
      for lag_n = numel(D_lag):-1:1 % for each distance lag    
        d = distance_diff(nn,:) < D_lag(lag_n) & ttd; % select points within spatial lag and 1 day        
        strfile(nn,lag_n).variable = datapoints(d); % save selected data points
    
        % calculate semi-variogram for each nn element and each lag_n
        a = datapoints(nn)-strfile(nn,lag_n).variable; % for each element in data points
        strfile(nn,lag_n).semi_var = 0.5*((nanmean(a).^0.5).^4) ./ ...
          (0.457+(0.494./length(strfile(nn,lag_n).variable)));    
      end
    end
    
    function strfile = method3(datapoints, D_lag, distance_diff, time_diff)
    d = distance_diff < permute(D_lag,[1,3,2]) & time_diff < 1;
    strfile = struct('variable',[],'semi_var',[]);