Question

我有一个巨大的元素列表，必须以某种方式处理。我知道可以通过以下方式使用Process进行多处理：

pr1 = Process(calculation_function, (args, ))
pr1.start()
pr1.join()

所以我可以创建10个进程并将10个分区的参数传递给args。然后就完成了工作。

但我不想手动创建它并手动计算。相反，我想使用ProcessPoolExecutor，我这样做：

executor = ProcessPoolExecutor(max_workers=10)
executor.map(calculation, (list_to_process,))

计算是我完成工作的功能。

def calculation(list_to_process):
    for element in list_to_process:
        # .... doing the job

list_to_process是我要处理的列表。

但是在运行此代码之后，循环迭代只进行一次。我以为

executor = ProcessPoolExecutor(max_workers=10)
executor.map(calculation, (list_to_process,))

与此相同10次：

pr1 = Process(calculation, (list_to_process, ))
pr1.start()
pr1.join()

但这似乎是错误的。

如何通过ProcessPoolExecutor实现真正的多处理？

Answer 1

从for功能中删除calculation循环。现在您正在使用ProcessPoolExecutor.map，map()调用是您的循环，不同之处在于列表中的每个元素都会发送到其他进程。 E.g。

def calculation(item):
    print('[pid:%s] performing calculation on %s' % (os.getpid(), item))
    time.sleep(5)
    print('[pid:%s] done!' % os.getpid())
    return item ** 2

executor = ProcessPoolExecutor(max_workers=5)
list_to_process = range(10)
result = executor.map(calculation, list_to_process)

您会在终端中看到类似的内容：

[pid:23988] performing calculation on 0
[pid:10360] performing calculation on 1
[pid:13348] performing calculation on 2
[pid:24032] performing calculation on 3
[pid:18028] performing calculation on 4
[pid:23988] done!
[pid:23988] performing calculation on 5
[pid:10360] done!
[pid:13348] done!
[pid:10360] performing calculation on 6
[pid:13348] performing calculation on 7
[pid:18028] done!
[pid:24032] done!
[pid:18028] performing calculation on 8
[pid:24032] performing calculation on 9
[pid:23988] done!
[pid:10360] done!
[pid:13348] done!
[pid:18028] done!
[pid:24032] done!

虽然事件的顺序会有效随机。由于某种原因，返回值（至少在我的Python版本中）实际上是itertools.chain对象。但这是一个实施细节。您可以将结果作为列表返回：

>>> list(result)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

在您的示例代码中，您已经传递了单元素元组(list_to_process,），这样就可以将完整列表传递给一个进程。

使用ProcessPoolExecutor进行并行处理

1 个答案: