Question

我有一组降雨数据，多年来每15分钟一次，共有820,000行。我的代码的目标（最终）是创建对数据进行分类的列，然后可以使用这些列来提取相关的数据块以供进一步分析。

我是Matlab新手，非常感谢您的帮助！

我的第一步工作足够快。但是，有些步骤非常慢。

我尝试过预先分配数组，并使用最低的intX（8或16取决于情况），但其他步骤太慢，无法完成。

慢的是for循环，但我不知道它们是否可以被矢量化/拆分成块/其他任何东西来加速它们。

我有一个变量“rain”，其中包含每个时间步/行的值。如果没有下雨，我创建了一个名为“state”的变量0，如果下雨，我创建了1。还有一个名为“begin”的变量，如果它是风暴的第一行，则为1，如果不是，则为0。

第一个慢循环是创建一个“拼写”变量 - 为每个暴雨提供一个数字。

% Generate blank column for spell of size (rain) - preallocate
    spell = zeros(size(st),1,'int16');

% Start row for analysis
    x=1;

% Populate "spell" variable with a storm number in each row of rain, for the storm number it belongs to (storm number calculated by adding up the number of "begin" values up to that point

    for i=1:size(state)
         if(state(x)==1)
             spell(x) =  sum(begin(1:x));
         end
       x=x+1;
    end

下一阶段是关于每场风暴的长度。第一步足够快。

 % List of storm numbers

     spellnum = unique(spell);

 % Length of each spell
     spelllength = histc(spell,spellnum);

下面的最后一步（for循环）太慢而且只是崩溃。

 % Generate blank column for length

      length = zeros(size(state),1,'int16');

 % Starting row

      x = 1;

 % For loop to output the total length of the storm for each row of rain within that storm

     for i=1:size(state)

          for j=1:size(state)
                 position = find(spell==x);

                      for k=1:size(state)
                          length(position) = spelllength(x+1);
                      end
          end

       x=x+1;

      end

是否有可能提高效率？

如果例子已经存在，我会道歉 - 我不确定这个过程会被调用什么！非常感谢提前。

Answer 1

存储分配/重新分配提示：

尝试直接从表达式创建结果（最终修剪另一个更一般的结果）;
如果不可能，请尽可能预先分配（当您有结果的上限时）;
如果2.不可能尝试生长单元阵列而不是大量矩阵（因为矩阵需要连续的内存区域）

类型选择提示：

尝试在中间结果中始终使用double，因为它是MATLAB中的基本数值数据类型;避免来回转换;
仅当存在可通过使用较小尺寸类型缓解的内存约束时，才使用其他类型作为中间结果。

线性化提示：

最快的线性化使用矩阵式或基于元素的基本代数运算与逻辑索引相结合。
循环从MATLAB R2008开始并没有那么糟;
性能最差的元素处理函数是arrayfun，cellfun和structfun，具有匿名函数，因为匿名函数评估最慢;
尽量不要两次计算相同的东西，即使这会给你更好的线性化。

第一块：

% Just calculate the entire cumulative sum over begin, then
% trim the result. Check if the cumsum doesn't overflow.
spell           = cumsum(begin);
spell(state==0) = 0;

第二块：

% The same, not sure how could you speed this up; changed
% the name of variables to my taste, though.
spell_num    = unique(spell);
spell_length = histc(spell,spell_num);

第三块：

% Fix the following issues: 
%   - the most-inner "for" does not make sense because it rewrites
%     several times the same thing;
%   - the same looping variable "i" is re-used in three nested loops,
%   - thename of the standard function "length" is obscured by declaring
%     a variable named "length".
for x = 1:numel(spell_num)
        storm_selector = (spell==spell_num(x));
        storm_length(storm_selector) = spell_length(x+1);
end;

Answer 2

我最终使用的代码组合是来自@CST_Link和@Sifu的混合。非常感谢您的帮助！我不认为Stackoverflow让我接受两个答案，所以为了清楚起见，这里是每个人帮助我创建的代码！

唯一缓慢的部分是第3块中的for循环，但这仍然会在几分钟内运行，这对我来说已经足够好了，并且比我的尝试要好得多。

第一块：

%% Spell
%spell is cumulative sum of begin

spell = cumsum(begin);

%% start row
x=1;

%% Replace all rows of spell with no rain with 0
spell(state==0)=0

第二个块（除了更好的变量名之外没有变化）：

%%  Spell number = all values of spell

spell_num = unique(spell);

%% Spell length = how many of each value of spell
spell_length = histc(spell,spell_num);

第三块：

%% Generate blank column for spell of size (state)
 spell_length2 = zeros(length(state),1);

%%
for x=1:length(state)
    position = find(spell==x);
    spell_length2(position) = spell_length(x+1);
end

Answer 3

如果我正在关注你正在做的事情，那么第一部分是

我创建了一些与您的描述匹配的数据用于测试请告诉我，如果我错过了什么

state=[ 1 0 0 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 0];
begin=[ 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0]; 
spell = zeros(length(state),1,'int16');
%Start row for analysis
    x=1;

% Populate "spell" variable with a storm number in each row of rain, for the storm number it belongs to (storm number calculated by adding up the number of "begin" values up to that point

    for i=1:length(state)
         if(state(x)==1)
             spell(x) =  sum(begin(1:x));
         end
       x=x+1;
    end
% can be accomplished by simply using cumsum ( no need for extra variables if you are short in memory)


   spell2=cumsum(begin);
    spell3=spell2.*(state==1);

和spell和spell3的输出如图所示

[spell.'; spell3]

 0      0      0      0      0      1      1      1      1      1      0      2      0      0      2      0      3      3    3      3      0
 0      0      0      0      0      1      1      1      1      1      0      2      0      0      2      0      3      3      3      3      0

Answer 4

为什么不这样做呢？

% For loop to output the total length of the storm for each row of rain within that storm

for x=1:size(state)
    position = find(spell==x);
    length(position) = spelllength(x+1);
end

我替换i的{{1}}迭代器，删除了2行和一些计算。
然后我继续删除了两个嵌套的循环，因为它们是无用的（每个循环会输出相同的东西）
这已经是一个良好的开端..

Matlab：加速循环应用于820,000个元素中的每一个

4 个答案: