mpi4py Gatherv面临KeyError:' 0'

时间:2016-08-11 20:34:16

标签: parallel-processing openmpi mpi4py

我是mpi4py的新手。我编写代码是为了处理多个处理器的大型numpy数组data。由于我无法提供输入文件,因此我提到了data的形状。 data的形状是[3000000,15],它包含字符串类型的数据。

from mpi4py import MPI
import numpy as np
import datetime as dt
import math as math


comm = MPI.COMM_WORLD
numprocs = comm.size
rank = comm.Get_rank()
fname = "6.binetflow"
data = np.loadtxt(open(fname,"rb"), dtype=object, delimiter=",", skiprows=1)
X = data[:,[0,1,3,14,6,6,6,6,6,6,6,6]]
num_rows = math.ceil(len(X)/float(numprocs))
X = X.flatten()
sendCounts = list()
displacements = list()
for p in range(numprocs):
    if p == (numprocs-1): #for last processor
        sendCounts.append(int(len(X) - (p*num_rows*12)))
        displacements.append(int(p*num_rows*12))
        break
    sendCounts.append(int(num_rows*12))
    displacements.append(int(p*sendCounts[p]))
sendbuf = np.array(X[displacements[rank]: (displacements[rank]+sendCounts[rank])])

## Each processor will do some task on sendbuf

if rank == 0:
    recvbuf = np.empty(sum(sendCounts), dtype=object)
else:
    recvbuf = None

print("sendbuf: ",sendbuf)
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
if rank == 0:
    print("Gathered array: {}".format(recvbuf))

但我面临以下错误:

Traceback (most recent call last):
  File "hello.py", line 36, in <module>
    comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
  File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
  File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
  File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
  File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
  File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
  File "hello.py", line 36, in <module>
    comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
  File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
  File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
  File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
  File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
  File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
  File "hello.py", line 36, in <module>
    comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
  File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
  File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
  File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
  File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
  File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
  File "hello.py", line 36, in <module>
    comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
  File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
  File "MPI/msgbuffer.pxi", line 516, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34587)
  File "MPI/msgbuffer.pxi", line 466, in mpi4py.MPI._p_msg_cco.for_cco_recv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34097)
  File "MPI/msgbuffer.pxi", line 261, in mpi4py.MPI.message_vector (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:31977)
  File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'

任何帮助将不胜感激。我长期陷入这个问题。

由于

1 个答案:

答案 0 :(得分:0)

问题是dtype=object

Mpi4py提供两种通信功能,名称以大写字母开头的通信功能,例如: Scatter,以及名称以小写字母开头的人,例如scatterFrom the Mpi4py documentation:

  

在MPI for Python中,Comm实例的Bcast(),Scatter(),Gather(),Allgather()和Alltoall()方法为内存缓冲区的集体通信提供支持。变种bcast(),scatter(),gather(),allgather()和alltoall()可以传递通用Python对象。

不清楚的是,即使numpy数组假设暴露内存缓冲区,缓冲区显然需要是一小组原始数据类型之一,当然也不能使用通用对象。比较以下两段代码:

from mpi4py import MPI
import numpy

Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()

if Rank == 0:
    Data = numpy.empty(Size, dtype=object)
else:
    Data = None

Data = Comm.scatter(Data, 0) # I work fine!

print("Data on rank %d: " % Rank, Data)

from mpi4py import MPI
import numpy

Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()

if Rank == 0:
    Data = numpy.empty(Size, dtype=object)
else:
    Data = None

Datb = numpy.empty(1, dtype=object)

Comm.Scatter(Data, Datb, 0) # I throw KeyError!

print("Datb on rank %d: " % Rank, Datb)

不幸的是,Mpi4py没有提供scatterv。来自文档中的相同位置:

  

矢量变体(可以向每个进程传递不同数量的数据)也支持Scatter(),Gatherv(),Allgatherv()和Alltoallv(),它们只能传递暴露内存缓冲区的对象。

这些也不是dtypes的大小写规则的例外:

from mpi4py import MPI
import numpy

Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()

if Rank == 0:
    Data = numpy.empty(2*Size+1, dtype=numpy.dtype('float64'))
else:
    Data = None

if Rank == 0:
    Datb = numpy.empty(3, dtype=numpy.dtype('float64'))
else:
    Datb = numpy.empty(2, dtype=numpy.dtype('float64'))

Comm.Scatterv(Data, Datb, 0) # I work fine!

print("Datb on rank %d: " % Rank, Datb)

from mpi4py import MPI
import numpy

Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()

if Rank == 0:
    Data = numpy.empty(2*Size+1, dtype=object)
else:
    Data = None

if Rank == 0:
    Datb = numpy.empty(3, dtype=object)
else:
    Datb = numpy.empty(2, dtype=object)

Comm.Scatterv(Data, Datb, 0) # I throw KeyError!

print("Datb on rank %d: " % Rank, Datb)

您很遗憾需要编写代码,以便它可以使用scatter,每个进程需要相同的SendCount,或更原始的点对点通信功能,或者使用除Mpi4py之外的一些并行工具。

使用Mpi4py 2.0.0,这是撰写本文时的当前稳定版本。