Question

我对matlab很新，但是对于我的工作，我需要导入一个ENORMOUS数据集并以某种方式组织它。我编写了一个代码来执行此操作，但非常低效（它只是我的第三个主要代码段，需要几个小时）。 Matlab告诉我，我可以预先分配我的变量（事实上大约五十次），但是我很难看到如何做到这一点，因为我不确定在for循环中每次迭代都会添加什么矩阵数据。代码本身可能比我更好地解释了这一点（这只是它的一小部分，但希望能显示我的问题）

for x= 1:length(firstinSeq)
            for y= 1:length(littledataPassed-1)
                if firstinSeq(x,1)== littledataPassed(y,1) && firstinSeq(x,2)== littledataPassed(y,2) 
                        switch firstinSeq(x,3)
                            case 0
                                for z= 0:1000
                                    w= y+z;  
                                    if firstinSeq(x,4)== littledataPassed(w,4) 
                                        if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0 
                                            msgLength0= [msgLength0; firstinSeq(x,:) littledataPassed(w,:)];
                                            break
                                        else continue
                                        end
                                    else msgLength0= [msgLength0; firstinSeq(x,:) [0 0 0 0 0 0]];  
                                        break
                                    end
                                end
                            case 1
                                for z= 0:1000
                                    w= y+z; 
                                    if firstinSeq(x,4)== littledataPassed(w,4) %if sequence not the same, terminate
                                        if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0
                                            msgLength1= [msgLength1; firstinSeq(x,:) littledataPassed(w,:)];
                                            break
                                        else continue
                                        end
                                    else msgLength1= [msgLength1; firstinSeq(x,:) [0 0 1 0 0 0]]; 
                                        break        
                                    end
                                end
                            case 2
                                for z= 0:1000
                                    w= y+z;
                                    if firstinSeq(x,4)== littledataPassed(w,4)
                                        if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0
                                            msgLength2= [msgLength2; firstinSeq(x,:) littledataPassed(w,:)];
                                            break
                                        else continue
                                        end
                                    else msgLength2= [msgLength2; firstinSeq(x,:) [0 0 2 0 0 0]];
                                        break
                                    end
                                end
                                for z= 0:1000
                                    w= y+z;
                                    if firstinSeq(x,4)== littledataPassed(w,4)
                                        if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 1
                                            msgLength2= [msgLength2; firstinSeq(x,:) littledataPassed(w,:)];
                                            break
                                        else continue
                                        end
                                    else msgLength2= [msgLength2; firstinSeq(x,:) [0 0 2 0 1 0]];  
                                        break
                                    end
                                end

关于如何预分配这些变量的任何想法（msgLength0,1,2等）？它们没有为循环中的每个值添加数据，我不确定每次运行的结束大小。我的开关现在总共有8个案例，这使得这个程序非常慢。

Answer 1

如果我正确读取了你的代码，那么通过最里面的循环每次行程都会扩展一个变量msgLengthN？如果是这样，则会提示您可能要预先分配一个名为msgLengthAll的数组，并在进行时填充该数组，确保每个条目中都有一个值来区分0,1,2等

如果您事先不知道为msgLengthAll分配多少空间，那么您可以：

扫描输入文件一次，以确定需要多大和其他数组。在处理大型文件时不会有任何耻辱，这可能会为您节省大量时间。 OR
沉迷于一些花哨的分配方案，最初你猜测需要多少空间msgLengthAll，然后，当它变满时，分配更多的内存。有多种方法可以决定在每个扩展点分配多少：固定大小或可能已经分配的数量（即每次扩展时分配的两倍）。当然，这可能非常复杂。

您是否逐行阅读文件并随时更新内存中的变量？或者你正在阅读整个文件，然后在内存中排序？ ENORMOUS有多大？你有多少RAM？

Answer 2

您可以通过查找符合条件的1000元素块中的记录索引，然后将它们一次性附加到msgLength0，来对每个切换案例中的处理进行矢量化。以下是case 0代码的矢量化版本：

indexStop = find(firstinSeq(x,4) != littledataPassed(y:y+1000,4), 1, 'first');
if isempty(indexStop)
   indexStop = 1000;
end
indexProcess = find(littledataPassed(y:y+indexStop,6) == 1 & ...
   littledataPassed(y:y+indexStop,2) == firstinSeq(x,2) & ...
   littledataPassed(y:y+indexStop,5) == 0);
msgLength0 = [msgLength0; firstinSeq(x,:) littledataPassed(y+indexProcess-1,:); [0 0 0 0 0 0]];

对外部循环进行矢量化也可以减少执行时间。我不太了解您的数据以建议一种特定的方法，但可能使用 reshape 和/或 repmat 函数来创建可以在vectorally上运行的数组可能是要走的路。

Matlab预分配复杂的循环问题

2 个答案: