大矩阵的逻辑索引需要太长时间

时间:2016-04-19 16:33:22

标签: matlab indexing

我有400万x 15的桌子。我应用这样的面具:

curData = data(curCodeMask & curEstDateMask,:);

面具通常一次拉出约5条记录。我在数据中循环686,000个唯一标识符并应用一系列函数。

我刚刚运行了分析器,发现该行占71.6%的时间来运行此功能。这看起来很奇怪。穿过1000个标识符需要3分钟,这意味着我在合理的时间内无法通过686,000。

有关加速过程的任何建议或解释为什么要使用此蒙版是一项挑战?

修改

test = repmat(cell2table(repmat({'AAA'},1,15)),4000000,1);
test([5500,60000,292404,290014,205802],1) = {'BBB'};
mask = strcmp(test.Var1,{'BBB'});
tic;test(mask,:);toc;
tic;test(find(mask),:);toc;

第一行占用0.078991秒,第二行占用.004005秒。我现在正在更改我的代码以使用find。有人解释为什么逻辑索引需要这么长时间吗?

1 个答案:

答案 0 :(得分:0)

似乎使用find引用矩阵的速度更快,因为当逻辑向量传递给subsref时(它是使用A(b)引用数组的一部分时Matlab调用的函数如果它是0或1,它必须对每个单独的值进行检查。如果一个然后获取值,如果为零则不执行任何操作。执行的这些检查的数量始终相同:示例中为4e6。另一方面,当您使用find时,您在示例中将少量索引传递给subsref5,并且完成任务所需的操作数量很多-小多了。这可以通过以下代码来说明,基于您的示例:

N = 4000000; 
test_ = repmat(cell2table(repmat({'AAA'},1,15)),N,1);

for i = 1000:50000:1000000
    test = test_;
    test(randi(N, i,1),1) = {'BBB'};
    mask = strcmp(test.Var1,{'BBB'});
    tic;test(mask,:);toc;
    tic;test(find(mask),:);toc;
    fprintf('\n');
end

在这里,我们逐渐增加掩码中1的数量,并使用find来衡量逻辑索引和显式索引的时间。如果你运行它,你会注意到第二次操作在每次迭代时如何大大增加,而第一次操作仍保持相同的数量级(尽管随着掩码数组中1的数量而增加): / p>

Elapsed time is 0.139837 seconds.
Elapsed time is 0.008691 seconds.

Elapsed time is 0.157868 seconds.
Elapsed time is 0.052939 seconds.

Elapsed time is 0.202240 seconds.
Elapsed time is 0.072856 seconds.

Elapsed time is 0.251927 seconds.
Elapsed time is 0.107802 seconds.

Elapsed time is 0.273101 seconds.
Elapsed time is 0.115091 seconds.

Elapsed time is 0.304513 seconds.
Elapsed time is 0.140899 seconds.

Elapsed time is 0.320670 seconds.
Elapsed time is 0.145321 seconds.

Elapsed time is 0.363253 seconds.
Elapsed time is 0.147760 seconds.

Elapsed time is 0.351677 seconds.
Elapsed time is 0.183472 seconds.

Elapsed time is 0.365312 seconds.
Elapsed time is 0.174972 seconds.

Elapsed time is 0.410181 seconds.
Elapsed time is 0.182656 seconds.

Elapsed time is 0.410353 seconds.
Elapsed time is 0.220691 seconds.

Elapsed time is 0.424916 seconds.
Elapsed time is 0.194585 seconds.

Elapsed time is 0.451175 seconds.
Elapsed time is 0.212605 seconds.

Elapsed time is 0.471104 seconds.
Elapsed time is 0.218952 seconds.

Elapsed time is 0.495794 seconds.
Elapsed time is 0.233784 seconds.

Elapsed time is 0.513798 seconds.
Elapsed time is 0.257506 seconds.

Elapsed time is 0.523019 seconds.
Elapsed time is 0.262233 seconds.

Elapsed time is 0.540470 seconds.
Elapsed time is 0.281143 seconds.

Elapsed time is 0.543509 seconds.
Elapsed time is 0.283295 seconds.

总而言之,当您知道test(find(mask),:)mask1中的成员数量与sum(mask)相比较时,请使用:length(mask)数组的总大小find。你问多小?您必须针对特定情况对您的机器进行一些实验。但是,从上面的玩具示例来看,似乎对于足够大的数组,table优于直接逻辑(至少对于.modal-arrow {left: 35px 类)。