我有一个需要产生的MPI程序A然后等待不同的MPI程序B完成。然后我需要生成并再次等待程序B.
计划A
IF (rank .eq. 0) THEN
CALL MPI_COMM_SPAWN('prog_b', MPI_ARGV_NULL, size, &
& MPI_INFO_NULL, 0, MPI_COMM_SELF, &
& child_comm, MPI_ERRCODES_IGNORE, status)
WRITE (*,*) 'Parent 1 Before'
CALL MPI_BARRIER(child_comm, status)
WRITE (*,*) 'Parent 1 After'
... Change some things ...
CALL MPI_COMM_SPAWN('prog_b', MPI_ARGV_NULL, size, &
& MPI_INFO_NULL, 0, MPI_COMM_SELF, &
& child_comm, MPI_ERRCODES_IGNORE, status)
WRITE (*,*) 'Parent 2 Before'
CALL MPI_BARRIER(child_comm, status)
WRITE (*,*) 'Parent 2 After'
END IF
计划B
... Wait to finished ...
CALL MPI_COMM_GET_PARENT(parent_comm, error)
IF (parent_comm .ne. MPI_COMM_NULL) THEN
WRITE (*,*) 'Before'
CALL MPI_BARRIER(parent_comm, error)
WRITE (*,*) 'After'
END IF
... Finalize ...
当我运行它时,程序B的第一次产生工作正常。但在第二轮比赛中,两个项目在第二道障碍上陷入僵局。我每次都会产生16个程序b的实例。
输出
Parent Before 1
... Output of program b ...
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
After
After
Before
After
Before
After
After
After
After
After
Before
After
After
Parent After 1
After
After
After
After
After
After
... Second call to spawn ...
Parent Before 2
... Output of program b ...
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
正如您所看到的,每个过程都会使它超越第一道屏障,但第二次失去锁定。我尝试在第一次产生调用后断开父节点和子节点的连接。我尝试合并父和子通信并在它们上面调用屏障,但似乎没有解决这个死锁问题。