我正在尝试使用有限差分法解决一些波动方程问题(与我的博士学位有关)。为此,我已经翻译(逐行)了一个fortran代码(下面的链接):(https://github.com/geodynamics/seismic_cpml/blob/master/seismic_CPML_2D_anisotropic.f90)
在这些代码内以及时间循环内,有四个独立的主循环。实际上,我可以将它们分为四个功能。
由于我必须运行此代码约一百次,因此加快处理速度将是很好的。从这个意义上讲,我正在把目光转向并行化。参见以下示例:
function main()
...some common code...
for time=1:N
function fun1() # I want this function to run parallel...
function fun2() # ..this function to run parallel with 1,3,4
function fun3() # ..This function to run parallel with 2,3,4
function fun4() # ..This function to run parallel with 1,2,3
end
... more code here...
return
end
所以
1)是否可以做我之前提到的事情?
2)这种方法会加速我的代码吗?
3)有没有更好的方法来解决这个问题?
一个最小的工作示例可能是这样的:
function fun1(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t+(0.3)^(t-1);
end
end
return t
end
function fun2(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t;
end
end
return t
end
function fun3(r)
for i=1:1000
for j=1:1000
r = (r + rand())/r;
end
end
return r
end
function main()
a = 2;
b = 2.5;
c = 3.0;
for i=1:100
a = fun1(a);
b = fun2(b);
c = fun3(c);
end
return;
end
因此,可以看出,以上三个函数(fun1,fun2和fun3)都不依赖于其他任何函数,因此它们可以并行运行。这些可以实现吗?会破坏我的计算速度吗?
编辑:
您好,@BogumiłKamiński,我已经更改了有限差分方程,以便在函数的输入和输出上实现“循环”(如您所建议的那样)。如果没有太多麻烦,我希望您对代码的并行化设计有意见:
关键元素
1)我将所有输入打包为4个元组: sig_xy_in 和 sig_xy_cros_in (用于2个sigma函数)以及 vel_vx_in 和 vel_vy_in < / strong>(用于2个速度函数)。然后,我将4个元组打包为2个向量,以实现“循环”目的...
2)我将4个函数打包在2个向量中,以实现“循环”目的...
3)我运行第一个并行循环,然后解压缩其输出元组...
4)我运行第二个并行循环(用于速度),然后解压缩其输出元组...
5)最后,我将输出的元素打包到输入元组中,并继续时间循环直到完成。
...code
l = Threads.SpinLock()
arg_in_sig = [sig_xy_in,sig_xy_cros_in]; # Inputs tuples x sigma funct
arg_in_vel = [vel_vx_in, vel_vy_in]; # Inputs tuples x velocity funct
func_sig = [sig_xy , sig_xy_cros]; # Vector with two sigma functions
func_vel = [vel_vx , vel_vy]; # Vector with two velocity functions
for it = 1:NSTEP # time steps
#------------------------------------------------------------
# Compute sigma functions
#------------------------------------------------------------
Threads.@threads for j in 1:2 # Star parallel of two sigma functs
Threads.lock(l);
Threads.unlock(l);
arg_in_sig[j] = func_sig[j](arg_in_sig[j]);
end
# Unpack tuples for sig_xy and sig_xy_cros
# Unpack tuples for sig_xy
sigxx = arg_in_sig[1][1]; # changed by sig_xy
sigyy = arg_in_sig[1][2]; # changed by sig_xy
m_dvx_dx = arg_in_sig[1][3]; # changed by sig_xy
m_dvy_dy = arg_in_sig[1][4]; # changed by sig_xy
vx = arg_in_sig[1][5]; # unchanged by sig_xy
vy = arg_in_sig[1][6]; # unchanged by sig_xy
delx_1 = arg_in_sig[1][7]; # unchanged by sig_xy
dely_1 = arg_in_sig[1][8]; # unchanged by sig_xy
...more unpacking...
# Unpack tuples for sig_xy_cros
sigxy = arg_in_sig[2][1]; # changed by sig_xy_cros
m_dvy_dx = arg_in_sig[2][2]; # changed by sig_xy_cros
m_dvx_dy = arg_in_sig[2][3]; # changed by sig_xy_cros
vx = arg_in_sig[2][4]; # unchanged by sig_xy_cros
vy = arg_in_sig[2][5]; # unchanged by sig_xy_cros
...more unpacking....
#--------------------------------------------------------
# velocity
#--------------------------------------------------------
Threads.@threads for j in 1:2 # Start parallel ot two velocity funct
Threads.lock(l)
Threads.unlock(l)
arg_in_vel[j] = func_vel[j](arg_in_vel[j])
end
# Unpack tuples for vel_vx
vx = arg_in_vel[1][1]; # changed by vel_vx
m_dsigxx_dx = arg_in_vel[1][2]; # changed by vel_vx
m_dsigxy_dy = arg_in_vel[1][3]; # changed by vel_vx
sigxx = arg_in_vel[1][4]; # unchanged changed by vel_vx
sigxy = arg_in_vel[1][5];....
# Unpack tuples for vel_vy
vy = arg_in_vel[2][1]; # changed changed by vel_vy
m_dsigxy_dx = arg_in_vel[2][2]; # changed changed by vel_vy
m_dsigyy_dy = arg_in_vel[2][3]; # changed changed by vel_vy
sigxy = arg_in_vel[2][4]; # unchanged changed by vel_vy
sigyy = arg_in_vel[2][5]; # unchanged changed by vel_vy
.....
...more unpacking...
# ensamble new input variables
sig_xy_in = (sigxx,sigyy,
m_dvx_dx,m_dvy_dy,
vx,vy,....);
sig_xy_cros_in = (sigxy,
m_dvy_dx,m_dvx_dy,
vx,vy,....;
vel_vx_in = (vx,....
vel_vy_in = (vy,.....
end #time loop
答案 0 :(得分:2)
这是在多线程模式下运行代码的简单方法:
function fun1(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t+(0.3)^(t-1);
end
end
return t
end
function fun2(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t;
end
end
return t
end
function fun3(r)
for i=1:1000
for j=1:1000
r = (r + rand())/r;
end
end
return r
end
function main()
l = Threads.SpinLock()
a = [2.0, 2.5, 3.0]
f = [fun1, fun2, fun3]
Threads.@threads for i in 1:3
for j in 1:4
Threads.lock(l)
println((thread=Threads.threadid(), iteration=j))
Threads.unlock(l)
a[i] = f[i](a[i])
end
end
return a
end
我添加了锁定-仅作为示例,您可以进行锁定(在Julia 1.3中,您不必这样做,因为IO在这里是线程安全的)。
还要注意,rand()
在Julia 1.3之前的线程之间共享数据,因此如果所有函数都使用rand()
,则运行这些函数将是不安全的(再次在Julia 1.3中这样做是安全的)
要运行此代码,请首先设置您要使用的最大线程数,例如在Windows:set JULIA_NUM_THREADS=4
上是这样的(在Linux中,您应该export
)。这是此代码运行的示例(为了缩短输出,我减少了迭代次数):
julia> main()
(thread = 1, iteration = 1)
(thread = 3, iteration = 1)
(thread = 2, iteration = 1)
(thread = 3, iteration = 2)
(thread = 3, iteration = 3)
(thread = 3, iteration = 4)
(thread = 2, iteration = 2)
(thread = 1, iteration = 2)
(thread = 2, iteration = 3)
(thread = 2, iteration = 4)
(thread = 1, iteration = 3)
(thread = 1, iteration = 4)
3-element Array{Float64,1}:
21.40311930108456
21.402807510451463
1.219028489573526
现在有一个警告提示-虽然在Julia中使代码成为多线程相对容易(在Julia 1.3中甚至更简单),但在处理时还是要小心,因为您必须注意竞争条件。