这些Open MPI Valgrind是误报吗?

时间:2016-03-02 13:38:37

标签: fortran valgrind openmpi false-positive

我正在尝试在最初用F77编写但现在用F90编写的单元代码中调试3D粒子。我自己和我的主管认为,这个问题的背景是在代码繁忙的某个时刻出现内存泄漏(物理交互后 - 当有大量粒子移动时)。 Valgrind使用openMPI抑制文件抽出以下内容。

==8057== Invalid read of size 8
==8057==    at 0x424300: send_b_bd_ (rpp3dsubs.f90:4521)
==8057==    by 0x43C23A: mymain_ (rpp3dmain.f90:1651)
==8057==    by 0x40221F: MAIN__ (rpp3d.f90:386)
==8057==    by 0x401BAC: main (rpp3d.f90:199)
==8057==  Address 0x16f81100 is 0 bytes inside a block of size 38,528 alloc'd
==8057==    at 0x4C2AC3D: malloc (vg_replace_malloc.c:299)
==8057==    by 0x438329: mymain_ (rpp3dmain.f90:120)
==8057==    by 0x40221F: MAIN__ (rpp3d.f90:386)
==8057==    by 0x401BAC: main (rpp3d.f90:199)

这些错误大约有三十个左右。其中一个子程序就是这个。

SUBROUTINE send_B_bd(myid)

  USE B_bd_arrays
  USE grid_parameter

  IMPLICIT NONE

  INCLUDE 'mpif.h'

  INTEGER idown,iup,inorth,isouth,tag,ierr,mpi_status,myid

  ! to +x and from -x
  IF (up >= 0) THEN
     tag = 0
     CALL MPI_SEND(B_bd_up(1,1),2*Grid_y*Grid_z,MPI_DOUBLE_PRECISION, &
          up,tag,MPI_COMM_WORLD,ierr)
  END IF

  IF (down >= 0) THEN
     tag = 0
     CALL MPI_RECV(B_bd_up(1,1),2*Grid_y*Grid_z,MPI_DOUBLE_PRECISION, &
          down,tag,MPI_COMM_WORLD,mpi_status,ierr)
     B_bd_down=B_bd_up
  ELSE
     B_bd_down=0.0d0
  END IF

  B_bd_up=0.0d0

  ! to +y and from -y
  IF (north >= 0) THEN
     tag = 0
     CALL MPI_SEND(B_bd_north(1,1),2*Grid_x*Grid_z,MPI_DOUBLE_PRECISION, &
          north,tag,MPI_COMM_WORLD,ierr)
  END IF

  IF (south >= 0) THEN
     tag = 0
     CALL MPI_RECV(B_bd_north(1,1),2*Grid_x*Grid_z,MPI_DOUBLE_PRECISION, &
          south,tag,MPI_COMM_WORLD,mpi_status,ierr)

     B_bd_south=B_bd_north
  ELSE
     B_bd_south=0.0d0
  END IF

  B_bd_north=0.0d0

END SUBROUTINE send_B_bd

导致问题的行似乎是

B_bd_up=0.0d0
B_bd_down=0.0d0

和其他类似的赋值语句。这些在代码中的其他地方声明(在底层文件rpp3d中),声明语句为:

module B_bd_arrays
  DOUBLE PRECISION, ALLOCATABLE,save, DIMENSION (:,:) :: &
       B_bd_north,B_bd_south,B_bd_down, B_bd_up
end module B_bd_arrays

可以安全地假设这些是有价值的声明,还是有更基本的东西在这里?我重申,在这段代码中,有些东西导致高分辨率运行的非物理结果,因此我试图在考虑代码中任何更基本的错误之前消除所有基本错误。

我意识到这可能是一个非常简单的问题,但我更愿意从比我更了解的人那里得到答案。据我了解,

fortranArray = 0.0d0

是一种完全有效的归零方法,不太可能产生代码来尝试访问不可用的内存。

此外,类似的问题似乎出现在诸如

之类的赋值语句中
E_bd_down(iy,iz)=Ex(ix,iy,iz)
E_bd_down(Grid_y+iy,iz)=Ey(ix,iy,iz)
E_bd_down(2*Grid_y+iy,iz)=Ez(ix,iy,iz)

这不能是共同发生的。我无法为我的生活找出可能出错的地方。

0 个答案:

没有答案