R中的Fortran中的OpenMP程序似乎无缘无故地悬挂?

时间:2012-09-24 11:00:33

标签: r fortran openmp

我在R中有一个程序调用了几个Fortran例程,这些例程都是openMP启用的。有两个Fortran例程sub_1sub_2。第一个在R函数中被调用两次,而第二个被调用一次。除了一些小事以外,这两个例程几乎完全相同。我称之为第一个例程,然后是第二个例程,然后是第一个例程。但是,如果我让它们都启用了openMP,那么当它第二次使用第一个fortran例程时,该函数会停止执行任何操作(没有错误或停止执行,只是坐在那里)。

如果我在sub_1中禁用openMP,那么一切运行正常。如果我在sub_2中禁用openMP,那么它在sub_1的第二次使用时会再次以相同的方式挂起。这很奇怪,因为它显然可以完成第一次使用。

我认为这可能与线程没有正确关闭或某事(我不太了解openMP)。然而,另一个奇怪的是调用这三个例程的R函数被调用了四次,如果我只在sub_2中启用openMP,那么这很好(即第二次,第三次等调用{{ 1}}不挂起)。我不知道为什么会这样做!作为参考,这是sub_2

的代码
sub_1

有没有人知道为什么会发生这种情况?

干杯。

我将添加一个我能想到的最小的例子,它确实可以重现问题(顺便说一下,这一定是一个R问题 - 我在这里提出的类型的一个小例子,但写在fortran工作正常) 。所以我在fortran中有上面的代码和以下代码,编译到共享对象subroutine correlation_dd_rad(s_bins,min_s,end_s,n,pos1,dd,r) !!! INTENT IN !!!!!!!! integer :: s_bins !Number of separation bins integer :: N !Number of objects real(8) :: pos1(3,N) !Cartesian Positions of particles real(8) :: min_s !The smallest separation calculated. real(8) :: end_s !The largest separation calculated. real(8) :: r(N) !The radii of each particle (ascending) !!! INTENT OUT !!!!!!! real(8) :: dd(N,s_bins) !The binned data. !!! LOCAL !!!!!!!!!!!! integer :: i,j !Iterators integer :: bin real(8) :: d !Distance between particles. real(8) :: dr,mins,ends real(8),parameter :: pi = 3.14159653589 integer :: counter dd(:,:) = 0.d0 dr = (end_s-min_s)/s_bins !Perform the separation binning mins = min_s**2 ends = end_s**2 counter = 1000 !$OMP parallel do private(d,bin,j) do i=1,N !$omp critical (count_it) counter = counter - 1 !$omp end critical (count_it) if(counter==0)then counter = 1000 write(*,*) "Another Thousand" end if do j=i+1,N if(r(j)-r(i) .GT. end_s)then exit end if d=(pos1(1,j)-pos1(1,i))**2+& &(pos1(2,j)-pos1(2,i))**2+& &(pos1(3,j)-pos1(3,i))**2 if(d.LT.ends .AND. d.GT.mins)then d = Sqrt(d) bin = Floor((d-min_s)/dr)+1 dd(i,bin) = dd(i,bin)+1.d0 dd(j,bin) = dd(j,bin)+1.d0 end if end do end do !$OMP end parallel do write(*,*) "done" end subroutine

correlate.so

然后在R中,我有以下函数 - 前两个函数只包含上面的fortran代码。第三种方式与我的实际代码类似:

subroutine correlation_dr_rad(s_bins,min_s,end_s,n,pos1,n2,pos2,dd,r1,r2)

!!! INTENT IN !!!!!!!!
integer             :: s_bins       !Number of separation bins
integer             :: N            !Number of objects
integer             :: n2
real(8)             :: pos1(3,N)    !Cartesian Positions of particles
real(8)             :: pos2(3,n2)   !random particles
real(8)             :: end_s        !The largest separation calculated.
real(8)             :: min_s        !The smallest separation
real(8)             :: r1(N),r2(N2) !The radii of particles (ascending)

!!! INTENT OUT !!!!!!!
real(8)             :: dd(N,s_bins)         !The binned data.

!!! LOCAL !!!!!!!!!!!!
integer             :: i,j      !Iterators
integer             :: bin
real(8)             :: d            !Distance between particles.
real(8)             :: dr,mins,ends
real(8),parameter   :: pi = 3.14159653589

integer             :: counter
dd(:,:) = 0.d0

dr = (end_s-min_s)/s_bins

!Perform the separation binning

mins = min_s**2
ends = end_s**2

write(*,*) "Got just before parallel dr"
counter = 1000
!$OMP parallel do private(d,bin,j)
do i=1,N
    !$OMP critical (count)
            counter = counter - 1
        !$OMP end critical (count)
            if(counter==0)then
                write(*,*) "Another thousand"
                counter = 1000
            end if
    do j=1,N2


        if(r2(j)-r1(i) .GT. end_s)then
            exit
        end if
        d=(pos1(1,j)-pos2(1,i))**2+&
            &(pos1(2,j)-pos2(2,i))**2+&
            &(pos1(3,j)-pos2(3,i))**2
        if(d.GT.mins .AND. d.LT.ends)then
            d = Sqrt(d)
            bin = Floor((d-min_s)/dr)+1
            dd(i,bin) = dd(i,bin)+1.d0
        end if
    end do
end do
!$OMP end parallel do

write(*,*) "Done"
end subroutine

运行这个我得到输出:

correlate_dd_rad = function(pos,r,min_r,end_r,bins){
  #A wrapper for the fortran routine of the same name.
  dyn.load('correlate.so')
  out = .Fortran('correlation_dd_rad',
             s_bins = as.integer(bins),
             min_s = as.double(min_r),
             end_s = as.double(end_r),
             n = as.integer(length(r)),
             pos = as.double(t(pos)),
             dd = matrix(0,length(r),bins), #The output matrix.
             r = as.double(r))

  dyn.unload('correlate.so')
  return(out$dd)
}

correlate_dr_rad = function(pos1,r1,pos2,r2,min_r,end_r,bins){
  #A wrapper for the fortran routine of the same name
  N = length(r1)
  N2 = length(r2)
  dyn.load('correlate.so')

  out = .Fortran('correlation_dr_rad',
             s_bins = as.integer(bins),
             min_s = as.double(min_r),
             end_s = as.double(end_r),
             n = N,
             pos1 = as.double(t(pos1)),
             n2 = N2,
             pos2 = as.double(t(pos2)),
             dr = matrix(0,nrow=N,ncol=bins),
             r1 = as.double(r1),
             r2 = as.double(r2))

  dyn.unload('correlate.so')
  return(out$dr)
}

the_calculation = function(){

  #Generate some data to use
  pos1 = matrix(rnorm(30000),10000,3)
  pos2 = matrix(rnorm(30000),10000,3)

  #Find the radii
  r1 = sqrt(pos1[,1]^2 + pos1[,2]^2+pos1[,3]^2)
  r2 = sqrt(pos2[,1]^2 + pos2[,2]^2+pos2[,3]^2)

  #usually sort them but it doesn't matter here.

  #Now call the functions
  print("Calculating the data-data pairs")
  dd = correlate_dd_rad(pos=pos1,r=r1,min_r=0.001,end_r=0.8,bins=15)

  print("Calculating the data-random pairs")
  dr = correlate_dr_rad(pos1,r1,pos2,r2,min_r=0.001,end_r=0.8,bins=15)

  print("Calculating the random-random pairs")
  rr = correlate_dd_rad(pos=pos2,r=r2,min_r=0.001,end_r=0.8,bins=15)

  #Now we would do something with it but I don't care in this example.
  print("Done")
}

然后它只是坐在那里......实际上,运行它几次表明它每次都会挂起。有时它通过第二次拨打 [1] "Calculating the data-data pairs" Another Thousand Another Thousand Another Thousand Another Thousand Another Thousand Another Thousand Another Thousand Another Thousand Another Thousand Another Thousand done [1] "Calculating the data-random pairs" Got just before parallel dr Another thousand Another thousand 和其他人的大部分时间只能通过调用correlate_dd_rad的一半。

1 个答案:

答案 0 :(得分:1)

我不确定这是否能解决您的问题,但确实是一个错误。在子程序correlation_dd_rad中,当您打算关闭并行区域时,实际上是在发表评论。更清楚的是:

 !OMP end parallel do

应转换为:

 !$OMP end parallel do

附注:

  1. 如果您不调用库函数,则不需要use omp_lib
  2. 您可以使用atomic构造(请参阅最新OpenMP规范的section 2.8.5)以原子方式访问特定存储位置,而不是critical构造
  3. 始终将critical构造的名称命名为(规范的第2.8.2节)
  4.   

    所有没有名称的关键构造都被视为具有相同的未指定名称。