我一直听说向量化代码比MATLAB中的循环运行得更快。但是,当我尝试向量化我的MATLAB代码时,它似乎运行得更慢。
我使用tic
和toc
来衡量时间。我只更改了程序中单个函数的实现。我的矢量化版本在47.228801
秒内运行,我的for-loop版本在16.962089
秒内运行。
同样在我的主程序中,我使用大数字表示N,N = 1000000
,DataSet的大小为1 301
,我为每个版本运行了几次相同大小的不同数据集和N。
为什么矢量化的速度要慢得多,如何才能进一步提高速度?
"矢量化"版本
function [RNGSet] = RNGAnal(N,DataSet)
%Creates a random number generated set of numbers to check accuracy overall
% This function will produce random numbers and normalize a new Data set
% that is derived from an old data set by multiply random numbers and
% then dividing by N/2
randData = randint(N,length(DataSet));
tempData = repmat(DataSet,N,1);
RNGSet = randData .* tempData;
RNGSet = sum(RNGSet,1) / (N/2); % sum and normalize by the N
end
" for-loop"版本
function [RNGData] = RNGAnsys(N,Data)
%RNGAnsys This function produces statistical RNG data using a for loop
% This function will produce RNGData that will be used to plot on another
% plot that possesses the actual data
multData = zeros(N,length(Data));
for i = 1:length(Data)
photAbs = randint(N,1); % Create N number of random 0's or 1's
multData(:,i) = Data(i) * photAbs; % multiply each element in the molar data by the random numbers
end
sumData = sum(multData,1); % sum each individual energy level's data point
RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2
end
答案 0 :(得分:4)
首先看一下for循环代码告诉我们,由于photAbs
是一个二进制数组,其每列都根据Data
的每个元素进行缩放,因此这个二进制特征可用于向量化。这在代码中被滥用 -
function RNGData = RNGAnsys_vect1(N,Data)
%// Get the 2D Matrix of random ones and zeros
photAbsAll = randint(N,numel(Data));
%// Take care of multData internally by summing along the columns of the
%// binary 2D matrix and then multiply each element of it with each scalar
%// taken from Data by performing elementwise multiplication
sumData = Data.*sum(photAbsAll,1);
%// Divide by n, but account for 0.5 average by n/2
RNGData = (sumData./(N/2))'; %//'
return;
分析后,似乎瓶颈是随机二进制数组创建部分。因此,使用this smart solution中建议的更快的随机二进制数组创建器,可以进一步优化上述函数 -
function RNGData = RNGAnsys_vect2(N,Data)
%// Create a random binary array and sum along the columns on the fly to
%// save on any variable space that would be required otherwise.
%// Also perform the elementwise multiplication as discussed before.
sumData = Data.*sum(rand(N,numel(Data))<0.5,1);
%// Divide by n, but account for 0.5 average by n/2
RNGData = (sumData./(N/2))'; %//'
return;
使用智能二进制随机数组创建器,原始代码也可以进行优化,稍后将用于优化的for-loop和矢量化代码之间的公平基准测试。这里列出了优化的for循环代码 -
function RNGData = RNGAnsys_opt1(N,Data)
multData = zeros(N,numel(Data));
for i = 1:numel(Data)
%// Create N number of random 0's or 1's using a smart approach
%// Then, multiply each element in the molar data by the random numbers
multData(:,i) = Data(i) * rand(N,1)<.5;
end
sumData = sum(multData,1); % sum each individual energy level's data point
RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2
return;
基准代码
N = 15000; %// Kept at this value as it going out of memory with higher N's.
%// Size of dataset is more important anyway as that decides how
%// well is vectorized code against a for-loop code
DS_arr = [50 100 200 500 800 1500 5000]; %// Dataset sizes
timeall = zeros(2,numel(DS_arr));
for k1 = 1:numel(DS_arr)
DS = DS_arr(k1);
Data = rand(1,DS);
f = @() RNGAnsys_opt1(N,Data);%// Optimized for-loop code
timeall(1,k1) = timeit(f);
clear f
f = @() RNGAnsys_vect2(N,Data);%// Vectorized Code
timeall(2,k1) = timeit(f);
clear f
end
%// Display benchmark results
figure,hold on, grid on
plot(DS_arr,timeall(1,:),'-ro')
plot(DS_arr,timeall(2,:),'-kx')
legend('Optimized for-loop code','Vectorized code')
xlabel('Dataset size ->'),ylabel('Time(sec) ->')
avg_speedup = mean(timeall(1,:)./timeall(2,:))
title(['Average Speedup with vectorized code = ' num2str(avg_speedup) 'x'])
<强>结果
结束语
根据我迄今为止使用MATLAB
的经验,循环和矢量化技术都不适合所有情况,但一切都是针对具体情况的。
答案 1 :(得分:3)
尝试使用matlab探查器确定哪一行或多行代码占用的时间最多。通过这种方式,您可以了解repmat函数是否会降低您的速度。让我们知道你发现了什么,我感兴趣!
答案 2 :(得分:0)
randData = randint(N,length(DataSet));
分配1.2GB阵列。 (4 * 301 * 1000000)。隐式地,你在程序中最多创建了4个这样的怪物,导致连续的缓存未命中。
你的for循环代码几乎可以在处理器缓存中运行(或者在更大的xeons上运行)。