Question

我有这样的代码

def plotFrame(n):
    a = data[n, :]
    do_something_with(a)

data = loadtxt(filename)
ids = data[:,0]  # some numbers from the first column of data
map(plotFrame, ids)

这对我来说很好。现在，我想尝试将map()替换为pool.map()，如下所示：

pools = multiprocessing.Pool(processes=1)
pools.map(plotFrame, ids)

但这不起作用，说：

NameError: global name 'data' is not defined

问题是：发生了什么？为什么map()不会抱怨未传递给该函数的data变量，而pool.map()会这样做？

修改我正在使用Linux。

编辑2： 根据@Bill的第二个建议，我现在有以下代码：

def plotFrame_v2(line):
    plot_with(line)

if __name__ == "__main__":
    ff = np.loadtxt(filename)
    m = int( max(ff[:,-1]) ) # max id
    l = ff.shape[0]
    nfig = 0
    pool = Pool(processes=1)
    for i in range(0, l/m, 50):
        data = ff[i*m:(i+1)*m, :] # data of one frame contains several ids
        pool.map(plotFrame_v2, data)
        nfig += 1        
        plt.savefig("figs_bot/%.3d.png"%nfig) 
        plt.clf()

这与预期一样有效。但是，现在我还有另一个意外的问题：生成的数字是空白的，而上面的代码map()生成的数字的内容为data。

Answer 1

使用multiprocessing.pool，您将生成单个进程以使用共享（全局）资源data。通常，您可以通过显式创建该资源global来允许进程使用父进程中的共享资源。但是，最好将所有需要的资源作为函数参数显式传递给子进程。如果您使用的是Windows，则必需。查看multiprocessing guidelines here。

所以你可以尝试做

data = loadtxt(filename)

def plotFrame(n):
    global data
    a = data[n, :]
    do_something_with(a)

ids = data[:,0]  # some numbers from the first column of data
pools = multiprocessing.Pool(processes=1)
pools.map(plotFrame, ids)

甚至更好地看到this thread关于使用multiprocessing.pool向函数提供多个参数。一个简单的方法可能是

def plotFrameWrapper(args):
    return plotFrame(*args)

def plotFrame(n, data):
    a = data[n, :]
    do_something_with(a)

if __name__ == "__main__":
    from multiprocessing import Pool
    data = loadtxt(filename)
    pools = Pool(1)

    ids = data[:,0]
    pools.map(plotFrameWrapper, zip([data]*len(inds), inds))
    print results

最后一件事：因为看起来你在你的例子中做的唯一事情是切片数组，你可以先切片然后将切片的数组传递给你的函数：

def plotFrame(sliced_data):
    do_something_with(sliced_data)

if __name__ == "__main__":
    from multiprocessing import Pool
    data = loadtxt(filename)
    pools = Pool(1)

    ids = data[:,0]
    pools.map(plotFrame, data[ids])
    print results

Answer 2

避免＆＃34;意外＆＃34;问题，避免全局。

使用调用map的内置plotFrame重现您的第一个代码示例：

def plotFrame(n):
    a = data[n, :]
    do_something_with(a)

使用multiprocessing.Pool.map，首先要处理全局data。如果do_something_with(a)也使用了一些全局数据，那么它也应该被更改。

要了解如何将numpy数组传递给子进程，请参阅Use numpy array in shared memory for multiprocessing。如果您不需要修改阵列，那么它甚至更简单：

import numpy as np

def init(data_): # inherit data
    global data #NOTE: no other globals in the program
    data = data_

def main():
    data = np.loadtxt(filename) 
    ids = data[:,0]  # some numbers from the first column of data
    pool = Pool(initializer=init, initargs=[data])
    pool.map(plotFrame, ids)

if __name__=="__main__":
    main()

所有参数都应该作为参数显式传递给plotFrame或通过init()继承。

您的第二个代码示例尝试再次操作全局数据（通过plt调用）：

import matplotlib.pyplot as plt

#XXX BROKEN, DO NOT USE
pool.map(plotFrame_v2, data)
nfig += 1        
plt.savefig("figs_bot/%.3d.png"%nfig) 
plt.clf()

除非您在主过程中绘制内容，否则此代码会保存空白数字。在子进程中绘图或者将数据显式地发送到父进程，例如，通过从plotFrame返回并使用pool.map()返回的值。这是一个代码示例：how to plot in child processes。

map（）和pool.map（）之间的区别

2 个答案: