在尝试选择推荐哪种索引方法时,我尝试了测试性能。然而,这些测量让我很困惑。我以不同的顺序多次运行,但测量结果保持一致。 以下是我衡量绩效的方法:
for N = [10000 15000 100000 150000]
x = round(rand(N,1)*5)-2;
idx1 = x~=0;
idx2 = abs(x)>0;
tic
for t = 1:5000
idx1 = x~=0;
end
toc
tic
for t = 1:5000
idx2 = abs(x)>0;
end
toc
end
这就是结果:
Elapsed time is 0.203504 seconds.
Elapsed time is 0.230439 seconds.
Elapsed time is 0.319840 seconds.
Elapsed time is 0.352562 seconds.
Elapsed time is 2.118108 seconds. % This is the strange part
Elapsed time is 0.434818 seconds.
Elapsed time is 0.508882 seconds.
Elapsed time is 0.550144 seconds.
我检查了大约100000的值,这也发生了,即使在50000时也会发生奇怪的测量。
所以我的问题是:是否有其他人在一定范围内经历过这种情况,是什么导致这种情况? (这是一个错误吗?)
答案 0 :(得分:7)
我认为这与JIT有关(以下结果使用的是2011b)。根据系统,Matlab的版本,变量的大小以及循环中的确切内容,使用JIT并不总是更快。这与“预热”效果有关,有时如果你在一个会话中多次运行一个m文件,它会在第一次运行后更快,因为加速器只需编译一次代码的一部分。 / p>
JIT on(feature accel on)
Elapsed time is 0.176765 seconds.
Elapsed time is 0.185301 seconds.
Elapsed time is 0.252631 seconds.
Elapsed time is 0.284415 seconds.
Elapsed time is 1.782446 seconds.
Elapsed time is 0.693508 seconds.
Elapsed time is 0.855005 seconds.
Elapsed time is 1.004955 seconds.
JIT off(功能加速关闭)
Elapsed time is 0.143924 seconds.
Elapsed time is 0.184360 seconds.
Elapsed time is 0.206405 seconds.
Elapsed time is 0.306424 seconds.
Elapsed time is 1.416654 seconds.
Elapsed time is 2.718846 seconds.
Elapsed time is 2.110420 seconds.
Elapsed time is 4.027782 seconds.
ETA,看看如果使用整数而不是双打会发生什么,这很有趣:
JIT on,相同的代码但使用int8转换为x
Elapsed time is 0.202201 seconds.
Elapsed time is 0.192103 seconds.
Elapsed time is 0.294974 seconds.
Elapsed time is 0.296191 seconds.
Elapsed time is 2.001245 seconds.
Elapsed time is 2.038713 seconds.
Elapsed time is 0.870500 seconds.
Elapsed time is 0.898301 seconds.
JIT off,使用int8
Elapsed time is 0.198611 seconds.
Elapsed time is 0.187589 seconds.
Elapsed time is 0.282775 seconds.
Elapsed time is 0.282938 seconds.
Elapsed time is 1.837561 seconds.
Elapsed time is 1.846766 seconds.
Elapsed time is 2.746034 seconds.
Elapsed time is 2.760067 seconds.
答案 1 :(得分:6)
这可能是由于matlab用于其基本线性代数子程序的一些自动优化。
就像你的一样,我的配置(OSX 10.8.4,带有默认设置的R2012a)需要更长的时间来计算x(10e5元素)的idx1 = x~=0
而不是x(11e5元素)。参见图的左侧面板,其中测量不同矢量尺寸(x轴)的处理时间(y轴)。您将看到N> 103000的较低处理时间。在此面板中,我还显示了计算期间处于活动状态的核心数。您将看到单核配置没有丢弃。这意味着当1个核心处于活动状态时,matlab不会优化~=
的执行(不可能并行化)。当满足两个条件时,Matlab启用一些优化例程:多个核和足够大小的向量。
右侧面板显示feature('accel','on'/off')
设置为关闭(doc)时的结果。这里,只有一个核是活动的(1核和4核相同),因此无法进行优化。
最后,我用于激活/停用核心的功能是maxNumCompThreads
。根据{{3}},maxNumCompThreads控制JIT和Loren Shure。由于feature('JIT','on'/'off')
没有在演出中发挥作用,因此BLAS是剩下的最后一个选择。
我将把最后一句留给Loren:“这里的主要信息是你一般不需要使用这个函数[maxNumCompThreads]!为什么?因为我们想让MATLAB做得最好你的工作可能。“
accel = {'on';'off'};
figure('Color','w');
N = 100000:1000:105000;
for ind_accel = 2:-1:1
eval(['feature(''accel'',''' accel{ind_accel} ''')']);
tElapsed = zeros(4,length(N));
for ind_core = 1:4
maxNumCompThreads(ind_core);
n_core = maxNumCompThreads;
for ii = 1:length(N)
fprintf('core asked: %d(true:%d) - N:%d\n',ind_core,n_core, ii);
x = round(rand(N(ii),1)*5)-2;
idx1 = x~=0;
tStart = tic;
for t = 1:5000
idx1 = x~=0;
end
tElapsed(ind_core,ii) = toc(tStart);
end
end
h2 = subplot(1,2,ind_accel);
plot(N, tElapsed,'-o','MarkerSize',10);
legend({('1':'4')'});
xlabel('Vector size','FontSize',14);
ylabel('Processing time','FontSize',14);
set(gca,'FontSize',14,'YLim',[0.2 0.7]);
title(['accel ' accel{ind_accel}]);
end