Question

最近，我不得不将用fortran编写的串行程序更改为并行版本，以便更快地获得结果。但我遇到了一些问题。

我使用的是ubuntu os和gfortran编译器，对于并行API，我使用的是OpeMP。在之前的（串行）版本中，我使用了许多模块来共享数据，但在openmp版本中，我创建了变量threadprivate属性，其中一些变量具有allocatable属性。在之前的版本中，我在do-loop之前为变量分配空间，但在openmp版本中，如果我这样做，程序将报告错误作为无效的内存引用，尽管我给它了threadprivate属性。所以我在循环中分配变量并在循环中释放它。然后我在并行区域中进行do循环。它没有错误，程序可以运行。但还有另一个问题。因为它运行大约800分钟的cpu时间，并且我使用ps -ux命令来查看该并行程序的状态，其状态从R1变为S1。我搜索S的含义，它代表

可中断的睡眠（等待事件完成）

那么为什么会出现这个问题呢？是因为我经常分配并释放空间吗？以下是示例代码：

module variables
real, dimension(:), allocatable, save :: a
real, dimension(:,:), allocatable, save :: b
!$omp threadprivate(a,b)
integer, parameter :: n=100
contains
   subroutine alloc_var
   integer :: status
   allocate(a(100),stat=status)
   allocate(b(100:100),stat=status)
   end subroutine
   subroutine free_var
   integer :: status
   deallocate(a,stat=status)
   deallocate(b,stat=status)
   end subroutine
end module

对于其他子程序，有一些使用变量a和b。

subroutine cal_sth
use variables, only a
...
end subroutine

用于串行版主程序

program main
implicit none   
external :: cal_sth
use variables, only alloc_var,free_var
integer :: i, j
call alloc_var
do j=1, count1
...
other expresion ...
do i=1, count2
   call cal_sth
end do
end do
call free_var
end program

表示并行区域，

program main
implicit none  
external :: cal_sth 
use variables, only alloc_var, free_var
integer :: i,j
!$omp parallel do private(i,j)
do j=1, count1
...
other expression ...
do i=1, count2
   call alloc_var
   call cal_sth
   if (logical expression) then
       call free_var
       cycle
   end if
   call free_var
end do
end do
end program

Answer 1

拆分组合的parallel do指令并重写并行循环，以便：

!$omp parallel
call alloc_var
!$omp do
do i=1, count
   call cal_sth
end do
!$omp end do
call free_var
!$omp end parallel

或根据Gilles＆＃39;使用专用的平行区域。评价：

program main
implicit none  
external :: cal_sth 
use variables, only alloc_var, free_var
integer :: i
!$omp parallel
call alloc_var
!$omp end parallel
...
!$omp parallel do
do i=1, count
   call cal_sth
end do
!$omp end parallel do
...
! other OpenMP regions
...
!$omp parallel
call free_var
!$omp end parallel
end program

Answer 2

使用您更新的代码，我认为您有两种不同的途径可以探索以提高性能：

内存分配：如前所述，alloc_var和free_var的调用只需要在parallel区域内进行，但绝对不一定在do内。 1}}循环。通过将parallel do分为parallel和do，您可以在进入循环之前调用alloc_var，并在退出后调用free_var它。并且可能需要释放/重新分配内存的潜在的早期退出内部循环本身并不构成阻止您执行此操作的约束。（有关如何执行此操作的示例，请参阅下面的代码）
调度：某些内部迭代的早期终止可能会转化为线程之间的某些负载不平衡。这可以解释您实验的等待时间。将调度明确设置为dynamic可能会减少此影响并提高性能。这需要进行一些实验，以找到要应用的最佳调度策略，但dynamic似乎是一个很好的起点。

所以这是你的代码，因为一旦实现了这两个想法就会看起来像：

program main
    implicit none  
    external :: cal_sth 
    use variables, only alloc_var, free_var
    integer :: i,j

    !$omp parallel schedule(dynamic)
    call alloc_var
    !$omp do private(i,j)
    do j=1, count1
        ...
        other expression ...
        do i=1, count2
            call cal_sth
            if (logical expression) then
                !uncomment these only if needed for some reasons
                !call free_var
                !call alloc_var
                cycle
            end if
        end do
    end do
    !$omp end do
    call free_var
    !$omp end parallel
end program

并行fortran程序将在某个时间睡眠

2 个答案: