在Julia中并行化两个(或多个)函数

时间:2019-09-03 13:15:53

标签: julia

我正在尝试使用有限差分法解决一些波动方程问题(与我的博士学位有关)。为此,我已经翻译(逐行)了一个fortran代码(下面的链接):(https://github.com/geodynamics/seismic_cpml/blob/master/seismic_CPML_2D_anisotropic.f90
在这些代码内以及时间循环内,有四个独立的主循环。实际上,我可以将它们分为四个功能。 由于我必须运行此代码约一百次,因此加快处理速度将是很好的。从这个意义上讲,我正在把目光转向并行化。参见以下示例:

function main()

...some common code...
   for time=1:N
       function fun1() # I want this function to run parallel... 
       function fun2() # ..this function to run parallel with 1,3,4
       function fun3() # ..This function to run parallel with 2,3,4
       function fun4() # ..This function to run parallel with 1,2,3
   end
   ... more code here...
return
end

所以

1)是否可以做我之前提到的事情?

2)这种方法会加速我的代码吗?

3)有没有更好的方法来解决这个问题?

一个最小的工作示例可能是这样的:

function fun1(t)
for i=1:1000
    for j=1:1000
        t+=(0.5)^t+(0.3)^(t-1);
    end
end
return t
end
function fun2(t)
for i=1:1000
    for j=1:1000
        t+=(0.5)^t;
    end
end
return t
end
function fun3(r)
for i=1:1000
    for j=1:1000
        r = (r + rand())/r;
    end
end
return r
end
function main()
    a = 2;
    b = 2.5;
    c = 3.0;
    for i=1:100
        a = fun1(a);
        b = fun2(b);
        c = fun3(c);
    end
return;
end

因此,可以看出,以上三个函数(fun1,fun2和fun3)都不依赖于其他任何函数,因此它们可以并行运行。这些可以实现吗?会破坏我的计算速度吗?

编辑:

您好,@BogumiłKamiński,我已经更改了有限差分方程,以便在函数的输入和输出上实现“循环”(如您所建议的那样)。如果没有太多麻烦,我希望您对代码的并行化设计有意见:

关键元素
1)我将所有输入打包为4个元组: sig_xy_in sig_xy_cros_in (用于2个sigma函数)以及 vel_vx_in vel_vy_in < / strong>(用于2个速度函数)。然后,我将4个元组打包为2个向量,以实现“循环”目的...
2)我将4个函数打包在2个向量中,以实现“循环”目的...
3)我运行第一个并行循环,然后解压缩其输出元组...
4)我运行第二个并行循环(用于速度),然后解压缩其输出元组...
5)最后,我将输出的元素打包到输入元组中,并继续时间循环直到完成。

...code

  l = Threads.SpinLock()
  arg_in_sig  = [sig_xy_in,sig_xy_cros_in]; # Inputs tuples x sigma funct
  arg_in_vel  = [vel_vx_in,     vel_vy_in]; # Inputs tuples x velocity funct
  func_sig    = [sig_xy   ,   sig_xy_cros]; # Vector with two sigma functions
  func_vel    = [vel_vx   ,        vel_vy]; # Vector with two velocity functions

  for it = 1:NSTEP # time steps
    #------------------------------------------------------------
    # Compute sigma functions 
    #------------------------------------------------------------
    Threads.@threads for j in 1:2 # Star parallel of two sigma functs  
        Threads.lock(l);
        Threads.unlock(l);
        arg_in_sig[j] = func_sig[j](arg_in_sig[j]);
    end

    # Unpack tuples for sig_xy and sig_xy_cros
    # Unpack tuples for sig_xy
    sigxx    = arg_in_sig[1][1];  # changed by sig_xy
    sigyy    = arg_in_sig[1][2];  # changed by sig_xy
    m_dvx_dx = arg_in_sig[1][3];  # changed by sig_xy
    m_dvy_dy = arg_in_sig[1][4];  # changed by sig_xy
    vx       = arg_in_sig[1][5];  # unchanged by sig_xy
    vy       = arg_in_sig[1][6];  # unchanged by sig_xy
    delx_1   = arg_in_sig[1][7];  # unchanged by sig_xy
    dely_1   = arg_in_sig[1][8];  # unchanged by sig_xy

    ...more unpacking...

    # Unpack tuples for sig_xy_cros
    sigxy    = arg_in_sig[2][1];  # changed by sig_xy_cros
    m_dvy_dx = arg_in_sig[2][2];  # changed by sig_xy_cros
    m_dvx_dy = arg_in_sig[2][3];  # changed by sig_xy_cros
    vx       = arg_in_sig[2][4];  # unchanged by sig_xy_cros
    vy       = arg_in_sig[2][5];  # unchanged by sig_xy_cros

    ...more unpacking....

    #--------------------------------------------------------
    # velocity
    #--------------------------------------------------------
    Threads.@threads for j in 1:2 # Start parallel ot two velocity funct
       Threads.lock(l)
       Threads.unlock(l)
       arg_in_vel[j] = func_vel[j](arg_in_vel[j])
    end

    # Unpack tuples for vel_vx
    vx          = arg_in_vel[1][1];  # changed by vel_vx
    m_dsigxx_dx = arg_in_vel[1][2];  # changed by vel_vx
    m_dsigxy_dy = arg_in_vel[1][3];  # changed by vel_vx
    sigxx       = arg_in_vel[1][4];  # unchanged changed by vel_vx
    sigxy       = arg_in_vel[1][5];....

    # Unpack tuples for vel_vy
    vy          = arg_in_vel[2][1];  # changed changed by vel_vy
    m_dsigxy_dx = arg_in_vel[2][2];  # changed changed by vel_vy
    m_dsigyy_dy = arg_in_vel[2][3];  # changed changed by vel_vy
    sigxy       = arg_in_vel[2][4];  # unchanged changed by vel_vy
    sigyy       = arg_in_vel[2][5];  # unchanged changed by vel_vy
    .....

    ...more unpacking...

    # ensamble new input variables
      sig_xy_in  = (sigxx,sigyy,
              m_dvx_dx,m_dvy_dy,
              vx,vy,....);

      sig_xy_cros_in = (sigxy,
              m_dvy_dx,m_dvx_dy,
              vx,vy,....;

      vel_vx_in = (vx,....
      vel_vy_in = (vy,.....
end #time loop

1 个答案:

答案 0 :(得分:2)

这是在多线程模式下运行代码的简单方法:

function fun1(t)
    for i=1:1000
        for j=1:1000
            t+=(0.5)^t+(0.3)^(t-1);
        end
    end
    return t
end
function fun2(t)
    for i=1:1000
        for j=1:1000
            t+=(0.5)^t;
        end
    end
    return t
end
function fun3(r)
    for i=1:1000
        for j=1:1000
            r = (r + rand())/r;
        end
    end
    return r
end

function main()
    l = Threads.SpinLock()
    a = [2.0, 2.5, 3.0]
    f = [fun1, fun2, fun3]
    Threads.@threads for i in 1:3
        for j in 1:4
            Threads.lock(l)
            println((thread=Threads.threadid(), iteration=j))
            Threads.unlock(l)
            a[i] = f[i](a[i])
        end
    end
    return a
end

我添加了锁定-仅作为示例,您可以进行锁定(在Julia 1.3中,您不必这样做,因为IO在这里是线程安全的)。 还要注意,rand()在Julia 1.3之前的线程之间共享数据,因此如果所有函数都使用rand(),则运行这些函数将是不安全的(再次在Julia 1.3中这样做是安全的)

要运行此代码,请首先设置您要使用的最大线程数,例如在Windows:set JULIA_NUM_THREADS=4上是这样的(在Linux中,您应该export)。这是此代码运行的示例(为了缩短输出,我减少了迭代次数):

julia> main()
(thread = 1, iteration = 1)
(thread = 3, iteration = 1)
(thread = 2, iteration = 1)
(thread = 3, iteration = 2)
(thread = 3, iteration = 3)
(thread = 3, iteration = 4)
(thread = 2, iteration = 2)
(thread = 1, iteration = 2)
(thread = 2, iteration = 3)
(thread = 2, iteration = 4)
(thread = 1, iteration = 3)
(thread = 1, iteration = 4)
3-element Array{Float64,1}:
 21.40311930108456
 21.402807510451463
  1.219028489573526

现在有一个警告提示-虽然在Julia中使代码成为多线程相对容易(在Julia 1.3中甚至更简单),但在处理时还是要小心,因为您必须注意竞争条件。