Question

我希望concurrent.futures.ProcessPoolExecutor.map()调用一个由2个或更多参数组成的函数。在下面的示例中，我使用lambda函数并将ref定义为具有相同值的numberlist大小相等的数组。

第一个问题：有更好的方法吗？在numberlist的大小可能是数百万到数十亿个元素的情况下，因此ref大小必须遵循numberlist，这种方法不必要地占用宝贵的内存，我想避免。我这样做是因为我读到map函数将终止其映射，直到达到最短的数组结束。

import concurrent.futures as cf

nmax = 10
numberlist = range(nmax)
ref = [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
workers = 3


def _findmatch(listnumber, ref):    
    print('def _findmatch(listnumber, ref):')
    x=''
    listnumber=str(listnumber)
    ref = str(ref)
    print('listnumber = {0} and ref = {1}'.format(listnumber, ref))
    if ref in listnumber:
        x = listnumber
    print('x = {0}'.format(x))
    return x 

a = map(lambda x, y: _findmatch(x, y), numberlist, ref)
for n in a:
    print(n)
    if str(ref[0]) in n:
        print('match')

with cf.ProcessPoolExecutor(max_workers=workers) as executor:
    #for n in executor.map(_findmatch, numberlist):
    for n in executor.map(lambda x, y: _findmatch(x, ref), numberlist, ref):
        print(type(n))
        print(n)
        if str(ref[0]) in n:
            print('match')

运行上面的代码，我发现map函数能够达到我想要的结果。但是，当我将相同的术语转移到concurrent.futures.ProcessPoolExecutor.map（）时，python3.5因此错误而失败：

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 241, in _feed
    obj = ForkingPickler.dumps(obj)
  File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x7fd2a14db0d0>: attribute lookup <lambda> on __main__ failed

问题2 ：为什么会出现此错误？如何让concurrent.futures.ProcessPoolExecutor.map（）调用具有多个参数的函数？

Answer 1

要首先回答您的第二个问题，您会收到例外情况，因为lambda功能与您正在使用的功能不相同。由于Python使用pickle协议来序列化主进程和ProcessPoolExecutor工作进程之间传递的数据，因此这是一个问题。不清楚你为什么要使用lambda。你拥有的lambda有两个参数，就像原始函数一样。您可以直接使用_findmatch而不是lambda，它应该有效。

with cf.ProcessPoolExecutor(max_workers=workers) as executor:
    for n in executor.map(_findmatch, numberlist, ref):
        ...

关于传递第二个常量参数而不创建巨型列表的第一个问题，您可以通过多种方式解决这个问题。一种方法可能是使用itertools.repeat创建一个可迭代的对象，在迭代时永远重复相同的值。

但更好的方法可能是编写一个额外的函数来为你传递常量参数。（也许这就是你尝试使用lambda函数的原因？）如果您使用的函数可以在模块的顶级命名空间访问，它应该可以工作：

def _helper(x):
    return _findmatch(x, 5)

with cf.ProcessPoolExecutor(max_workers=workers) as executor:
    for n in executor.map(_helper, numberlist):
        ...

Answer 2

（1）无需列出清单。您可以使用itertools.repeat创建一个只重复某个值的迭代器。

（2）您需要将命名函数传递给map，因为它将传递给子进程执行。 map使用pickle协议发送内容，lambda不能被腌制，因此它们不能成为地图的一部分。但它完全没必要。你所做的所有lambda都是用2参数调用2参数函数。完全删除它。

工作代码是

import concurrent.futures as cf
import itertools

nmax = 10
numberlist = range(nmax)
workers = 3

def _findmatch(listnumber, ref):    
    print('def _findmatch(listnumber, ref):')
    x=''
    listnumber=str(listnumber)
    ref = str(ref)
    print('listnumber = {0} and ref = {1}'.format(listnumber, ref))
    if ref in listnumber:
        x = listnumber
    print('x = {0}'.format(x))
    return x 

with cf.ProcessPoolExecutor(max_workers=workers) as executor:
    #for n in executor.map(_findmatch, numberlist):
    for n in executor.map(_findmatch, numberlist, itertools.repeat(5)):
        print(type(n))
        print(n)
        #if str(ref[0]) in n:
        #    print('match')

Answer 3

关于你的第一个问题，我是否正确理解你想要传递一个参数，该参数的值仅在你调用map时确定，但对于映射函数的所有实例都是常量？如果是这样，我会使用从＆＃34;模板函数派生的函数来执行map＆＃34;使用ref

将第二个参数（在您的示例中为functools.partial）烘焙到其中

from functools import partial
refval = 5

def _findmatch(ref, listnumber):  # arguments swapped
    ...

with cf.ProcessPoolExecutor(max_workers=workers) as executor:
    for n in executor.map(partial(_findmatch, refval), numberlist):
        ...

重新。问题2，第一部分：我还没有找到试图挑选（序列化）应该并行执行的函数的确切代码片段，但这听起来很自然 - 不仅仅是参数但是函数也必须以某种方式转移到 workers ，并且可能必须为此传输序列化。事件partial可以在lambda s期间被腌制的事实在其他地方无法提及，例如：https://stackoverflow.com/a/19279016/6356764。

重新。问题2，第二部分：如果你想在ProcessPoolExecutor.map中调用一个带有多个参数的函数，你可以将函数作为第一个参数传递给它，然后是函数的第一个参数的迭代，然后是可迭代的第二个参数等。在你的情况下：

for n in executor.map(_findmatch, numberlist, ref):
    ...

如何将具有多个参数的函数传递给python concurrent.futures.ProcessPoolExecutor.map（）？

3 个答案: