我正在编写一个python脚本,它使用MPI向工作人员发送未排序的数组,这将对所述数组进行排序并将它们返回到主数据库。
使用mpirun -n 2 python mpi_sort.py
最多mpirun -n 5 python mpi_sort.py
运行它可以正常工作,但是当数组的数量太大而且工作人员永远不会停止时,DIE消息似乎会丢失。
运行超过5个工作程序,脚本在执行时很早就停止了。通常,工作人员将获得第一批数组,返回主数据库,并且永远不会再获得更多工作。我很难过为什么会这样。
更糟糕的是,如果我减少阵列的大小或数量,更多的工作人员似乎做得很好。
代码如下:
#!/usr/bin/ENV python
import numpy
from mpi4py import MPI
NUMARRAYS = 1000
ARRAYSIZE = 10000
ASK_FOR_WORK_TAG = 1
WORK_TAG = 2
DIE_TAG = 3
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
status = MPI.Status()
# Master
if rank == 0:
data = numpy.empty(ARRAYSIZE, dtype=numpy.int32)
sorted_data = numpy.empty([NUMARRAYS, ARRAYSIZE], dtype=numpy.int32)
sorted_arrays = 0
while sorted_arrays < NUMARRAYS:
print "[Master] Probing"
comm.Recv(data, source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=status)
print "[Master] Probed"
dest = status.Get_source()
print "[Master] got request for work from worker %d" % dest
data = numpy.random.random_integers(0, ARRAYSIZE, ARRAYSIZE).astype(numpy.int32)
print "[Master] sending work to Worker %d" % dest
comm.Send([data, ARRAYSIZE, MPI.INT], dest=dest, tag=WORK_TAG)
print "[Master] sent work to Worker %d" % dest
print "[Master] waiting for complete work from someone"
comm.Recv([data, ARRAYSIZE, MPI.INT], source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=status)
print "[Master] got results from Worker %d. Storing in line %d" % (status.Get_source(), sorted_arrays)
sorted_data[sorted_arrays] = numpy.copy(data)
numpy.savetxt("sample", data, newline=" ", fmt="%d")
sorted_arrays += 1
for dest in range(1, size):
print "[Master] Telling Worker %d to DIE DIE DIE" % dest
comm.Send(data, dest=dest, tag=DIE_TAG)
# Slave
else:
# Ask for work
data = numpy.empty(ARRAYSIZE, dtype=numpy.int32)
while True:
print "[Worker %d] asking for work" % rank
comm.Send(data, dest=0, tag=ASK_FOR_WORK_TAG)
print "[Worker %d] sent request for work" % rank
comm.Recv(data, source=0, tag=MPI.ANY_TAG, status=status)
if status.Get_tag() == WORK_TAG:
print "[Worker %d] got work" % rank
print "[Worker %d] is sorting the array" % rank
data.sort()
print "[Worker %d] finished work. Sending it back" % rank
comm.Send([data, ARRAYSIZE, MPI.INT], dest=0, tag=ASK_FOR_WORK_TAG)
else:
print "[Worker %d] DIE DIE DIE" % rank
break
答案 0 :(得分:0)
我发现了问题。
有一些僵局,比如@mgilson建议。
首先,工人会把工作送回去,但是主人会将其解释为工作请求,而工人并不期望这样做。
然后,有一个类似的杀戮信息问题。 DIE消息将发送给不期待它们的工作人员。
最终解决方案是:
#!/usr/bin/ENV python
import numpy
from mpi4py import MPI
NUMARRAYS = 100
ARRAYSIZE = 10000
ASK_FOR_WORK_TAG = 1
WORK_TAG = 2
WORK_DONE_TAG = 3
DIE_TAG = 4
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
status = MPI.Status()
# Master
if rank == 0:
data = numpy.empty(ARRAYSIZE, dtype=numpy.int32)
sorted_data = numpy.empty([NUMARRAYS, ARRAYSIZE], dtype=numpy.int32)
sorted_arrays = 0
dead_workers = 0
while dead_workers < size - 1:
print "[Master] Probing"
comm.Recv([data, ARRAYSIZE, MPI.INT], source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=status)
print "[Master] Probed"
dest = status.Get_source()
if status.Get_tag() == ASK_FOR_WORK_TAG:
if sorted_arrays <= NUMARRAYS - 1:
print "[Master] got request for work from worker %d" % dest
data = numpy.random.random_integers(0, ARRAYSIZE, ARRAYSIZE).astype(numpy.int32)
print "[Master] sending work to Worker %d" % dest
comm.Send([data, ARRAYSIZE, MPI.INT], dest=dest, tag=WORK_TAG)
print "[Master] sent work to Worker %d" % dest
else:
# Someone did more work than they should have
print "[Master] Telling worker %d to DIE DIE DIE" % dest
comm.Send([data, ARRAYSIZE, MPI.INT], dest=dest, tag=DIE_TAG)
dead_workers += 1
print "[Master] Already killed %d workers" % dead_workers
elif status.Get_tag() == WORK_DONE_TAG:
if sorted_arrays <= NUMARRAYS - 1:
print "[Master] got results from Worker %d. Storing in line %d" % (status.Get_source(), sorted_arrays)
sorted_data[sorted_arrays] = numpy.copy(data)
numpy.savetxt("sample", data, newline=" ", fmt="%d")
sorted_arrays += 1
# Slave
else:
# Ask for work
data = numpy.empty(ARRAYSIZE, dtype=numpy.int32)
while True:
print "[Worker %d] asking for work" % rank
comm.Send([data, ARRAYSIZE, MPI.INT], dest=0, tag=ASK_FOR_WORK_TAG)
print "[Worker %d] sent request for work" % rank
comm.Recv([data, ARRAYSIZE, MPI.INT], source=0, tag=MPI.ANY_TAG, status=status)
if status.Get_tag() == WORK_TAG:
print "[Worker %d] got work" % rank
print "[Worker %d] is sorting the array" % rank
data.sort()
print "[Worker %d] finished work. Sending it back" % rank
comm.Send([data, ARRAYSIZE, MPI.INT], dest=0, tag=WORK_DONE_TAG)
elif status.Get_tag() == DIE_TAG:
print "[Worker %d] DIE DIE DIE" % rank
break
else:
print "[Worker %d] Doesn't know what to to with tag %d right now" % (rank, status.Get_tag())