Jacobi算法的并行实现需要太多时间

时间:2019-05-21 09:44:55

标签: matlab matlabpool

我实现了Jacobi方法的并行版本以解决线性系统的问题。做一些测试,我注意到与执行顺序功能相比,并行执行功能的时间非常高。这很奇怪,因为使用并行实现执行时,Jacobi的方法应该更快。

我认为我在代码中做错了事:

function [x,niter,resrel] = Parallel_Jacobi(A,b,TOL,MAXITER)
[n, m] = size(A); 
D = 1./spdiags(A,0);
B = speye(n)-A./spdiags(A,0);
C= D.*b;
x0=sparse(zeros(length(A),1));
spmd

    cod_vett=codistributor1d(1,codistributor1d.unsetPartition,[n,1]);
    cod_mat=codistributor1d(1,codistributor1d.unsetPartition,[n,m]);

    B= codistributed(B,cod_mat);
    C= codistributed(C,cod_vett);
    x= codistributed(B*x0 + C,cod_vett);

    Niter = 1; 
    TOLX = TOL;  
    while(norm(x-x0,Inf) > norm(x0,Inf)*TOLX && Niter < MAXITER)
        if(TOL*norm(x,Inf) > realmin)
            TOLX = norm(x,Inf)*TOL;
        else
            TOLX = realmin;
        end    
        x0 = x;
        x = B*x0 + C;
        Niter=Niter+1;
    end
end
Niter=Niter{1}; 
x=gather(x);
end

下面有测试

%sequential Jacobi
format long;
A = gallery('poisson',20);
tic;
x= jacobi(A,ones(400,1),1e-6,2000000);
toc;
Elapsed time is 0.009054 seconds.
%parallel Jacobi
format long;
A = gallery('poisson',20);
tic;
x= Parallel_Jacobi(A,ones(400,1),1e-6,2000000);
toc;
Elapsed time is 11.484130 seconds.

我用1,2,3和4个工作程序(我有一个四核处理器)为parpool函数计时,结果如下:

%Test
format long;
A = gallery('poisson',20);
delete(gcp('nocreate'));
tic
%parpool(1/2/3/4) means that i executed 4 tests that differ only for the 
%argument in the function: first parpool(1), second parpool(2) and so on.
parpool(1/2/3/4);
toc
tic;
x= Parallel_Jacobi(A,ones(400,1),1e-6,2000000);
toc;

4 workers: parpool=13.322899 seconds, function=23.772271 

3 workers: parpool=10.911769 seconds, function=16.402633 

2 workers: parpool=9.371729 seconds, function=12.945154 

1 worker: parpool=8.460357 seconds, function=7.982958 .

工人越少,时间越好。就像@Adriaan所说的那样,很可能是开销。

这是否意味着在这种情况下,顺序函数总是比并行函数快?还是有更好的方法来实现并行的?

this question中,当迭代次数多时,并行性能会更好。就我而言,此测试只有32次迭代。

Jacobi方法的顺序实现是这样的:

function [x,niter,resrel] = jacobi(A,b,TOL,MAXITER)
n = size(A,1); 
D = 1./spdiags(A,0);
B = speye(n)-A./spdiags(A,0);
C= D.*b;

x0=sparse(zeros(length(A),1));
x = B*x0 + C;
Niter = 1; 
TOLX = TOL;  

while(norm(x-x0,Inf) > norm(x0,Inf)*TOLX && Niter < MAXITER) 
    if(TOL*norm(x,Inf) > realmin)
        TOLX = norm(x,Inf)*TOL;
    else
        TOLX = realmin;
    end    

    x0 = x;
    x = B*x0 + C;

    Niter=Niter+1;
end
end

我使用timeit函数对代码进行计时,结果是这些(输入与先前的输入相同):

4名工人:11.693473075964102

3名工人:9.221281335264003

2名工人:9.150417240778545

1名工人:6.047181992020434

顺序:0.002893932969688

0 个答案:

没有答案