我正在测试mpi4py支持的非阻塞通信,并且遇到了isend
的意外行为(至少对我而言):出于某种原因,以isend
发送的消息直到发送过程完成或调用wait
返回的请求实例的isend
方法,这使isend
的有效性无效。
仅当进程在不同的计算机上运行时,才会观察到此行为。
代码:
from mpi4py import MPI
import socket
from time import sleep,time
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
node=socket.gethostname()
print 'rank {} on {}'.format(rank,node)
if rank == 1:
message=1
req_send=comm.isend(message, dest=0, tag=11)
#req_send.wait() #no issue if uncommented
sleep(10)
print 'sending process finished'
elif rank == 0:
t=time()
data=comm.recv(source=1, tag=11)
print 'message recieved: {}, waiting time: {}'.format( data,time()-t)
结果/输出:
1 。不同的机器,#req_send.wait()
行被注释掉(错误的设置;仅在发送过程完成后才收到消息,这会使等待时间增加10秒时间):
rank 0 on node1
rank 1 on node2
sending process finished
message recieved: 1, waiting time: 10.0342979431
2 。不同的机器,req_send.wait()
行未注释:
rank 0 on node1
rank 1 on node2
message recieved: 1, waiting time: 0.000602006912231
sending process finished
3 。同一台机器,带有或不带有req_send.wait()
行:
rank 1 on node1
rank 0 on node1
message recieved: 1, waiting time: 2.09808349609e-05
sending process finished
我为ancaconda2尝试了多个mpi4py构建,并且行为类似。但是,使用 anaconda mpi4py 和错误的设置会导致其他错误出现在输出中:
rank 0 on node1
rank 1 on node2
sending process finished
message recieved: 1, waiting time: 10.0120418072
Assertion failed in file ch3u_handle_connection.c at line 332: vc->state == MPIDI_VC_STATE_LOCAL_CLOSE || vc->state == MPIDI_VC_STATE_CLOSE_ACKED
internal ABORT - process 1
什么可能导致此问题?如何解决/解决?