Question

我是paralell编程的新手，所以我想要一个GPU和一个CPU来完成我对Schnakenberg模型的研究。我有下一个代码，但我不知道如何开始并行化代码。它在MATLAB中完美运行，我想继续在meshgrid上运行较小值的程序：

%Solve a Turing model system of equations in 2-D space over time. Apply
%Euler’s Method to a semi-discretized Reaction-Diffusion system.
%clear all
%Grid size
Tf=1000000;
a=-1; % Lower boundary
b=1; % Upper boundary
M=50; % M is the number of spaces between points a and b.
dx=0.04; %(b-a)/M; % dx is delta x
dy=0.04; %(b-a)/M;
x=linspace(a,b,M+1); % M+1 equally spaced x vectors including a and b.
y=linspace(a,b,M+1);
%Time stepping
dt=0.08; %100*(dx^2)/2; % dt is delta t the time step
N=Tf/dt; % N is the number of time steps in the interval [0,1]
%Constant Values
D=0.516; % D is the Diffusion coefficient Du/Dv
delta=0.0021; % sizes the domain for particular wavelengths
alpha=0.899; % a is alpha, a coefficient in f and g (-a is gamma)
beta=-0.91; % b is beta, another coefficient in f and g
r1=3.5; % r1 is the cubic term
r2=0; % r2 is the quadratic term
gamma=-alpha; % g is for gamma
%pre-allocation
unp1=zeros(M+3,M+3);
vnp1=zeros(M+3,M+3);
%Initial Conditions
un=-0.5+rand(M+3,M+3); %Begin with a random point between [-0.5,0.5]
vn=-0.5+rand(M+3,M+3);
for n=1:N
for i=2:M+2
un(i,1)=un(i,3); %Boundary conditions on left flux is zero
un(i,M+3)=un(i,M+1); %Boundary conditions on right
vn(i,1)=vn(i,3);
vn(i,M+3)=vn(i,M+1);
end
for j=2:M+2
un(1,j)=un(3,j); %Boundary conditions on left
un(M+3,j)=un(M+1,j); %Boundary conditions on right
vn(1,j)=vn(3,j);
vn(M+3,j)=vn(M+1,j);
end
for i=2:M+2
for j=2:M+2
%Source function for u and v
srcu=alpha*un(i,j)*(1-r1*vn(i,j)^2)+vn(i,j)*(1-r2*un(i,j));
srcv=beta*vn(i,j)*(1+(alpha*r1/beta)*un(i,j)*vn(i,j))+un(i,j)*(gamma+r2*vn(i,j));
uxx=(un(i-1,j)-2*un(i,j)+un(i+1,j))/dx^2; %Laplacian u
vxx=(vn(i-1,j)-2*vn(i,j)+vn(i+1,j))/dx^2; %Laplacian v
uyy=(un(i,j-1)-2*un(i,j)+un(i,j+1))/dy^2; %Laplacian u
vyy=(vn(i,j-1)-2*vn(i,j)+vn(i,j+1))/dy^2; %Laplacian v
Lapu=uxx+uyy;
Lapv=vxx+vyy;
unp1(i,j)=un(i,j)+dt*(D*delta*Lapu+srcu);
vnp1(i,j)=vn(i,j)+dt*(delta*Lapv+srcv);
end
end
un=unp1;
vn=vnp1;
% Graphing
if mod(n,6250)==0
%subplot(2,1,2)
hdl = surf(x,y,un(2:M+2,2:M+2));
set(hdl,'edgecolor','none');
axis([ -1, 1,-1,1]);
%caxis([-10,15]);
view(2);
colorbar;
fprintf('Time t = %f\n',n*dt);
ch = input('Hit enter to continue :','s');
if (strcmp(ch,'k') == 1)
keyboard;
end
end
end

所以最重要的问题是有人可以帮我订购代码并将工作发送给GPU中的工作人员，以减少执行时间并获得Euler的大时间和非常小步骤的框架39;方法。我的意思是，如何将代码分开以便在并行编程中使用 parfor 和所有这些句子？

Answer 1

您需要首先对代码进行矢量化，即使没有GPU，您也会看到速度提升的显着提升。一旦正确矢量化，在GPU上运行它将很容易。

由于n上的外部循环包含随时间的演变并取决于之前的结果，因此无法对其进行矢量化。但是你可以消除的所有内部for循环：

而不是

for i=2:M+2
    un(i,1)=un(i,3); %Boundary conditions on left flux is zero
    un(i,M+3)=un(i,M+1); %Boundary conditions on right
    vn(i,1)=vn(i,3);
    vn(i,M+3)=vn(i,M+1);
end

使用：

i=2:M+2
un(i,1)=un(i,3); %Boundary conditions on left flux is zero
un(i,M+3)=un(i,M+1); %Boundary conditions on right
vn(i,1)=vn(i,3);
vn(i,M+3)=vn(i,M+1);

同样你可以用：

替换double for循环

i=2:M+2;
j=2:M+2;
srcu=alpha*un(i,j).*(1-r1*vn(i,j)^2)+vn(i,j).*(1-r2*un(i,j));
srcv=beta*vn(i,j).*(1+(alpha*r1/beta)*un(i,j).*vn(i,j))+un(i,j).*(gamma+r2*vn(i,j));
uxx=(un(i-1,j)-2*un(i,j)+un(i+1,j))/dx^2; %Laplacian u
vxx=(vn(i-1,j)-2*vn(i,j)+vn(i+1,j))/dx^2; %Laplacian v
uyy=(un(i,j-1)-2*un(i,j)+un(i,j+1))/dy^2; %Laplacian u
vyy=(vn(i,j-1)-2*vn(i,j)+vn(i,j+1))/dy^2; %Laplacian v

请注意，这看起来与您最初编码的内容非常相似，但是'for'语句已经消失，并且在某些情况下我将“*”替换为“。*”，这表示对整个矩阵的元素操作，例如matlab可以计算

un(i,j).*(1-r1*vn(i,j)^2)

对于i和j中的整个索引列表，比通过显式循环时更快。

如果以这种方式实现您的功能，它应该在CPU上运行得更快。因为在你的主循环中，你现在只对矩阵执行元素操作，通过将你在循环中使用的所有变量作为gpuArrays强制转换为GPU，并使用'gather'将结果返回给CPU，将其移植到GPU是微不足道的。

Schnakenberg-Turing在Matlab中的并行代码

1 个答案: