我是mpi4py的新手。我编写代码是为了处理多个处理器的大型numpy数组data
。由于我无法提供输入文件,因此我提到了data
的形状。 data
的形状是[3000000,15],它包含字符串类型的数据。
from mpi4py import MPI
import numpy as np
import datetime as dt
import math as math
comm = MPI.COMM_WORLD
numprocs = comm.size
rank = comm.Get_rank()
fname = "6.binetflow"
data = np.loadtxt(open(fname,"rb"), dtype=object, delimiter=",", skiprows=1)
X = data[:,[0,1,3,14,6,6,6,6,6,6,6,6]]
num_rows = math.ceil(len(X)/float(numprocs))
X = X.flatten()
sendCounts = list()
displacements = list()
for p in range(numprocs):
if p == (numprocs-1): #for last processor
sendCounts.append(int(len(X) - (p*num_rows*12)))
displacements.append(int(p*num_rows*12))
break
sendCounts.append(int(num_rows*12))
displacements.append(int(p*sendCounts[p]))
sendbuf = np.array(X[displacements[rank]: (displacements[rank]+sendCounts[rank])])
## Each processor will do some task on sendbuf
if rank == 0:
recvbuf = np.empty(sum(sendCounts), dtype=object)
else:
recvbuf = None
print("sendbuf: ",sendbuf)
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
if rank == 0:
print("Gathered array: {}".format(recvbuf))
但我面临以下错误:
Traceback (most recent call last):
File "hello.py", line 36, in <module>
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
File "hello.py", line 36, in <module>
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
File "hello.py", line 36, in <module>
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
File "hello.py", line 36, in <module>
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
File "MPI/msgbuffer.pxi", line 516, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34587)
File "MPI/msgbuffer.pxi", line 466, in mpi4py.MPI._p_msg_cco.for_cco_recv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34097)
File "MPI/msgbuffer.pxi", line 261, in mpi4py.MPI.message_vector (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:31977)
File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
任何帮助将不胜感激。我长期陷入这个问题。
由于
答案 0 :(得分:0)
问题是dtype=object
。
Mpi4py提供两种通信功能,名称以大写字母开头的通信功能,例如: Scatter
,以及名称以小写字母开头的人,例如scatter
。 From the Mpi4py documentation:
在MPI for Python中,Comm实例的Bcast(),Scatter(),Gather(),Allgather()和Alltoall()方法为内存缓冲区的集体通信提供支持。变种bcast(),scatter(),gather(),allgather()和alltoall()可以传递通用Python对象。
不清楚的是,即使numpy数组假设暴露内存缓冲区,缓冲区显然需要是一小组原始数据类型之一,当然也不能使用通用对象。比较以下两段代码:
from mpi4py import MPI
import numpy
Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()
if Rank == 0:
Data = numpy.empty(Size, dtype=object)
else:
Data = None
Data = Comm.scatter(Data, 0) # I work fine!
print("Data on rank %d: " % Rank, Data)
和
from mpi4py import MPI
import numpy
Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()
if Rank == 0:
Data = numpy.empty(Size, dtype=object)
else:
Data = None
Datb = numpy.empty(1, dtype=object)
Comm.Scatter(Data, Datb, 0) # I throw KeyError!
print("Datb on rank %d: " % Rank, Datb)
不幸的是,Mpi4py没有提供scatterv
。来自文档中的相同位置:
矢量变体(可以向每个进程传递不同数量的数据)也支持Scatter(),Gatherv(),Allgatherv()和Alltoallv(),它们只能传递暴露内存缓冲区的对象。
这些也不是dtypes的大小写规则的例外:
from mpi4py import MPI
import numpy
Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()
if Rank == 0:
Data = numpy.empty(2*Size+1, dtype=numpy.dtype('float64'))
else:
Data = None
if Rank == 0:
Datb = numpy.empty(3, dtype=numpy.dtype('float64'))
else:
Datb = numpy.empty(2, dtype=numpy.dtype('float64'))
Comm.Scatterv(Data, Datb, 0) # I work fine!
print("Datb on rank %d: " % Rank, Datb)
与
from mpi4py import MPI
import numpy
Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()
if Rank == 0:
Data = numpy.empty(2*Size+1, dtype=object)
else:
Data = None
if Rank == 0:
Datb = numpy.empty(3, dtype=object)
else:
Datb = numpy.empty(2, dtype=object)
Comm.Scatterv(Data, Datb, 0) # I throw KeyError!
print("Datb on rank %d: " % Rank, Datb)
您很遗憾需要编写代码,以便它可以使用scatter
,每个进程需要相同的SendCount
,或更原始的点对点通信功能,或者使用除Mpi4py之外的一些并行工具。
使用Mpi4py 2.0.0,这是撰写本文时的当前稳定版本。