Question

所以我有一个函数，通过只使用两个参数来处理几个.txt文件。它目前正在按预期工作，但我在近一个小时内完成了10％的东西 - 因此需要一些时间，因为.txt文件非常大。

现在，我已经阅读了关于包多处理的内容，特别是关于它的Pool段。但是，我不太确定我是如何正确使用它的。

我用来运行我的函数的代码如下：

for k, structure in enumerate(structures):
    structure_maker(structure_path, structure)

structure_path始终相同，而structures是不同值的列表，例如：

structures = [1, 3, 6, 7, 8, 10, 13, 25, 27]

那么我将如何使用Pool进程呢？据我所知，我必须做一些事情：

from multiprocessing import Pool

mypool = Pool(6) # Choosing how many cores I want to use
mypool.map(structure_maker, list)

list是我迷路的地方。应该是什么？ structures列表，如果是，我在哪里放入structure_path？

Answer 1

Pool.map()函数的工作方式类似于内置的map()函数，换句话说，它将传递给它的函数作为参数传递给传递给它的iterable中的每个项目。第二个论点。每次调用提供的函数时，它都会将iterable中的下一个项目作为函数的 single 参数提供。

在这种情况下，潜在的问题是您要使用的函数structure_maker()需要两个参数。有不同的方法，但在这种情况下，由于其中一个参数始终是相同的，您可以使用functools.partial()函数创建临时函数，只需要将第二个参数传递给它 - 和你可以在mypool.map()电话中做到这一点。

这就是我的意思：

from multiprocessing import Pool

def structure_maker(structure_path, structure):
    """ Dummy for testing. """
    return structure_path, structure

if __name__ == '__main__':

    from pprint import pprint
    from functools import partial

    mypool = Pool(4)
    structure_path = 'samething'
    structures = [1, 3, 6, 7, 8, 10, 13, 25, 27]
    results = mypool.map(partial(structure_maker, structure_path), structures)
    pprint(results)

输出：

[('samething', 1),
 ('samething', 3),
 ('samething', 6),
 ('samething', 7),
 ('samething', 8),
 ('samething', 10),
 ('samething', 13),
 ('samething', 25),
 ('samething', 27)]

Answer 2

您可能需要制作和解包tuple。

def structure_maker_proxy(args):
    structure_path, structure = args
    structure_maker(structure_path, structure)


from multiprocessing import Pool

mypool = Pool(6) # Choosing how many cores I want to use

lis = [(structure_path, struct) for struct in structures]
mypool.map(structure_maker_proxy, lis)

具有两个参数的函数的多处理（池）

2 个答案: