更快地使用涉及大尺寸循环的Matlab代码

时间:2018-04-10 16:51:39

标签: matlab

我有这段Matlab代码,如果可能的话,我想提高效率。特别是,我想加速两个位(称为BIT 1BIT 2) - 在n上的循环内 - 这可能会花费很多时间来n_mn_w

clear
N=[3 4; 100 200; 300 400; 2000 2000; 100000 100000];
output1=cell(size(N,1),1);
output2=cell(size(N,1),1);
for n=1:size(N,1)
    n_m=N(n,1);
    n_w=N(n,2);
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
    %BIT 1
    temp=zeros(n_w+1, n_m);
    for i=1:n_m
        temp(:,i)=(i:n_m:n_m*n_w+i).'; 
    end
    output1{n}=temp(:).'; %1x(n_m*(n_w+1))
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %BIT 2
    temp=zeros(n_m+1,n_w);
    for j=1:n_w
        temp(:,j)=[(j-1)*n_m+1:j*n_m n_m*n_w+n_m+j].';
    end
    output2{n}=temp(:).'; %1x(n_w*(n_m+1))
end

您有更快的建议吗?

关于BIT 1的简要说明:对于给定的n_mn_w,BIT 1创建了一个维1x(n_m*(n_w+1))的行向量,可以在n_m子区域中进行拆分每个行都有维度1x(n_w+1)。子行i包含整数(i:n_m:n_m*n_w+i)

关于BIT 2的简要说明:对于给定的n_mn_w,BIT 2会创建一个维1x(n_w*(n_m+1))的行向量,可以在n_w子区域中进行拆分每个行都有维度1x(n_m+1)。子行j包含整数[(j-1)*n_m+1:j*n_m, n_m*n_w+n_m+j]

在这里,我将循环版本与reshape选项进行比较:reshape无效。

clear
N=3 5; 100 200; 300 400; 2000 2000; 5000 5000; 10000 10000; 20000 20000];
output1=cell(size(N,1),1);
output2=cell(size(N,1),1);
output3=cell(size(N,1),1); %alternative of output1 with reshape
output4=cell(size(N,1),1); %alternative of output2 with reshape
time=zeros(size(N,1),4);
for n=1:size(N,1)
    n_m=N(n,1);
    n_w=N(n,2);
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
    %BIT 1 with loop  
    tic
    temp=zeros(n_w+1, n_m);
    for i=1:n_m
        temp(:,i)=(i:n_m:n_m*n_w+i).'; 
    end
    output1{n}=temp(:).'; 
    time(n,1)=toc;
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    
    %BIT 1 with reshape
    tic
    tempor=reshape(1:1:n_m*(n_w+1), n_m, n_w+1);
    temp1=tempor.'; 
    output3{n}=temp1(:).';
    time(n,3)=toc;
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %BIT 2 with loop
    tic
    temp=zeros(n_m+1,n_w);
    for j=1:n_w
        temp(:,j)=[(j-1)*n_m+1:j*n_m n_m*n_w+n_m+j].';
    end
    output2{n}=temp(:).'; 
    time(n,2)=toc;
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
    %BIT 2 with reshape   
    tic
    temp1=tempor(:,1:end-1);
    temp2=n_m*n_w+n_m+1:n_m*n_w+n_m+n_w;
    temp=[temp1; temp2];
    output4{n}=temp(:).';
    time(n,4)=toc;
end

我得到了

time=
0.0003    0.0006    0.0001    0.0001
0.0005    0.0005    0.0003    0.0002
0.0021    0.0011    0.0029    0.0006
0.0159    0.0189    0.0230    0.0189
0.0915    0.1068    0.1503    0.1260
0.3015    0.3757    0.6035    0.5211
1.1501    1.3801    2.4459    2.0828

(第三和第四列速度较慢,我试图超过20000但reshape永远运行)

2 个答案:

答案 0 :(得分:2)

对于较小的矩阵,我使用repmat优于for循环

function testf(k, N)


n_m=N(1);
n_w=N(2);

switch k
    case 1
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
        %BIT 1
        tempA = ones(n_w+1,1) + (0:n_w).'*n_m;
        tempB = repmat( 0:(n_m-1), n_w+1, 1);
        tempC = tempB(:) + repmat(tempA, n_m, 1);
        output1=tempC(:).'; %1x(n_m*(n_w+1))
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        %BIT 2
        tempC = zeros(n_m+1,n_w);
        tempA = repmat((1:n_m).', 1,n_w);
        tempB = repmat( 0:(n_w-1), n_m, 1)*(n_m);
        tempC(1:end-1, :) = tempA + tempB;
        tempC(end, :) = (1:n_w) + (n_w+1)*n_m;
        output2=tempC(:).'; %1x(n_w*(n_m+1))
    case 2
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
        %BIT 1
        temp=zeros(n_w+1, n_m);
        for i=1:n_m
            temp(:,i)=(i:n_m:n_m*n_w+i).'; 
        end
        output1=temp(:).'; %1x(n_m*(n_w+1))
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        %BIT 2
        temp=zeros(n_m+1,n_w);
        for j=1:n_w
            temp(:,j)=[(j-1)*n_m+1:j*n_m n_m*n_w+n_m+j].';
        end
        output2=temp(:).'; %1x(n_w*(n_m+1))
end

end

测试代码

figure
N = [100,150; 150,180; 200,250; 250,300; 300,350; 400,500; 450, 550];
T = zeros(size(N,1),size(N,2),10);
for mm = 1:10
    for nn = 1:size(N,1)
        T(nn,:,mm) = [timeit(@() testf(1, N(nn,:))), timeit(@() testf(2, N(nn,:)))];
    end
end
T = mean(T,3);
plot(T)

time plot

在Matlab R2015b上运行

编辑:我注意到即使timeit也无法衡量准确的运行时间。 所以我添加了一个for循环来多次运行timeit

编辑:回复评论。

  

有趣! ones(n_w+1,1) + (0:n_w).'*n_m真的比它快吗?   (1:n_m:n_m*n_w+1).'?此外,repmat( 0:(n_w-1), n_m, 1)*(n_m)应该是   慢于repmat( 0:(n_w-1)*n_m, n_m, 1)因为有很多   完成了更多的乘法。 - Cris Luengo

第一个问题,是的。 我在方法中的第一个tempA之后,以及OP中第一个for循环之后的所有内容都注释掉了所有内容。 结果如下。

time plot compare a single line of code

但这有点不公平,因为for循环中只有一行,但我的方法有三行。 无论如何,我最初的动机是从for生成向量节省时间。 我可以一次生成一堆矢量。

对于乘法,我比较了两种策略。令人惊讶的是,对于像250x300这样的小型矩阵,两者之间几乎没有任何区别。 对于更大的矩阵,乘法所节省的时间远远少于存储它们的费用,因此时间图并没有真正改变。

  

我非常关心N大(500以上),你的回答是建议的   没有比循环更好的了吗? - user3285148

这是一个具有挑战性的部分。 如果你真的关心速度用一块Matlab代码.... 那么这就是我能想到的。 只有当块足够小时,这个想法才会比for更快。 因此,您可以将大矩阵切割成较小的块,并为每个小块执行repmat样式。 显然你需要用for循环将所有碎片拼接在一起,但我这样做的赌注会更快...... 此外,您还必须考虑如何有效地将大矩阵修剪为实际大小 - 例如,您有一个1234x5678矩阵,并且您的自动代码可以生成100x100的块。

一个示例方式可能是这样的

    case 3
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%      
        %BIT 1
        temp=zeros(n_w+1, n_m);
        vec = (1:n_m:(n_m*n_w+1)).';
        for ii=1:n_m
            temp(:,ii) = vec;
            vec = vec + 1;
        end
        output1=temp(:).'; %1x(n_m*(n_w+1))
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        %BIT 2
        temp=zeros(n_m+1,n_w);
        vec = (1:n_m).';
        for jj = 1:n_w
            temp(1:end-1,jj) = vec;
            vec = vec + n_m;
        end
        temp(end,:) = n_m*n_w+n_m + (1:n_w);
        output2=temp(:).'; %1x(n_w*(n_m+1))

和测试代码是这样的

figure
N = [100,150; 150,180; 200,250; 250,300; 300,350; 400,500; 450, 550;
    550,650; 700,800; 800,1000];
T = zeros(size(N,1),3,10);
for mm = 1:10
    for nn = 1:size(N,1)
        T(nn,:,mm) = [timeit(@() testf(1, N(nn,:))), ....
            timeit(@() testf(2, N(nn,:))), ....
            timeit(@() testf(3, N(nn,:)))];
    end
end
T = mean(T,3);
plot(T)

然后时间图是这样的

time plot with case 3

显示节省约20%的时间。

答案 1 :(得分:0)

只是部分答案,而我更多地考虑这个问题。我不能在评论中写下来......

请注意,循环不一定很慢。

中最慢的位
temp=zeros(n_w+1, n_m);
for i=1:n_m
    temp(:,i)=(i:n_m:n_m*n_w+i).'; 
end
output1{n}=temp(:).'; 

是索引。我想你可能想写这样的东西:

temp = ones(n_w+1, n_m);
temp(:,1) = (1:n_m:n_m*n_w+1).';
temp = cumsum(temp, 2);
output1{n} = temp(:).';

我这里没有MATLAB,所以无法计时。我不知道这是否更快。

在BIT 2中,您执行类似的操作,但在每次迭代中都会向列中添加n_m。因此,您需要将temp初始化为n_m。我认为temp(:)=n_m是实现这一目标的最快方法。