我实现了Jacobi方法的并行版本以解决线性系统的问题。做一些测试,我注意到与执行顺序功能相比,并行执行功能的时间非常高。这很奇怪,因为使用并行实现执行时,Jacobi的方法应该更快。
我认为我在代码中做错了事:
function [x,niter,resrel] = Parallel_Jacobi(A,b,TOL,MAXITER)
[n, m] = size(A);
D = 1./spdiags(A,0);
B = speye(n)-A./spdiags(A,0);
C= D.*b;
x0=sparse(zeros(length(A),1));
spmd
cod_vett=codistributor1d(1,codistributor1d.unsetPartition,[n,1]);
cod_mat=codistributor1d(1,codistributor1d.unsetPartition,[n,m]);
B= codistributed(B,cod_mat);
C= codistributed(C,cod_vett);
x= codistributed(B*x0 + C,cod_vett);
Niter = 1;
TOLX = TOL;
while(norm(x-x0,Inf) > norm(x0,Inf)*TOLX && Niter < MAXITER)
if(TOL*norm(x,Inf) > realmin)
TOLX = norm(x,Inf)*TOL;
else
TOLX = realmin;
end
x0 = x;
x = B*x0 + C;
Niter=Niter+1;
end
end
Niter=Niter{1};
x=gather(x);
end
下面有测试
%sequential Jacobi
format long;
A = gallery('poisson',20);
tic;
x= jacobi(A,ones(400,1),1e-6,2000000);
toc;
Elapsed time is 0.009054 seconds.
%parallel Jacobi
format long;
A = gallery('poisson',20);
tic;
x= Parallel_Jacobi(A,ones(400,1),1e-6,2000000);
toc;
Elapsed time is 11.484130 seconds.
我用1,2,3和4个工作程序(我有一个四核处理器)为parpool
函数计时,结果如下:
%Test
format long;
A = gallery('poisson',20);
delete(gcp('nocreate'));
tic
%parpool(1/2/3/4) means that i executed 4 tests that differ only for the
%argument in the function: first parpool(1), second parpool(2) and so on.
parpool(1/2/3/4);
toc
tic;
x= Parallel_Jacobi(A,ones(400,1),1e-6,2000000);
toc;
4 workers: parpool=13.322899 seconds, function=23.772271
3 workers: parpool=10.911769 seconds, function=16.402633
2 workers: parpool=9.371729 seconds, function=12.945154
1 worker: parpool=8.460357 seconds, function=7.982958 .
工人越少,时间越好。就像@Adriaan所说的那样,很可能是开销。
这是否意味着在这种情况下,顺序函数总是比并行函数快?还是有更好的方法来实现并行的?
在this question中,当迭代次数多时,并行性能会更好。就我而言,此测试只有32次迭代。
Jacobi方法的顺序实现是这样的:
function [x,niter,resrel] = jacobi(A,b,TOL,MAXITER)
n = size(A,1);
D = 1./spdiags(A,0);
B = speye(n)-A./spdiags(A,0);
C= D.*b;
x0=sparse(zeros(length(A),1));
x = B*x0 + C;
Niter = 1;
TOLX = TOL;
while(norm(x-x0,Inf) > norm(x0,Inf)*TOLX && Niter < MAXITER)
if(TOL*norm(x,Inf) > realmin)
TOLX = norm(x,Inf)*TOL;
else
TOLX = realmin;
end
x0 = x;
x = B*x0 + C;
Niter=Niter+1;
end
end
我使用timeit函数对代码进行计时,结果是这些(输入与先前的输入相同):
4名工人:11.693473075964102
3名工人:9.221281335264003
2名工人:9.150417240778545
1名工人:6.047181992020434
顺序:0.002893932969688