Question

我有一个函数接受itertools组合提供的两个输入，并输出一个解决方案。两个输入应存储为在dict中形成键的元组，而结果是值。

我可以汇集这个并将所有结果作为列表获取，然后我可以逐个插入到字典中，但这似乎效率低下。有没有办法在每个作业完成后获得结果，并直接将其添加到词典？

基本上，我有以下代码：

all_solutions = {}
for start, goal in itertools.combinations(graph, 2):
    all_solutions[(start, goal)] = search(graph, start, goal)

我正在尝试将其并行化如下：

all_solutions = {}
manager = multiprocessing.Manager()
graph_pool = manager.dict(graph)
pool = multiprocessing.Pool()
results = pool.starmap(search, zip(itertools.repeat(graph_pool),
                                   itertools.combinations(graph, 2)))
for i, start_goal in enumerate(itertools.combinations(graph, 2)):
    start, goal = start_goal[0], start_goal[1]
    all_solutions[(start, goal)] = results[i]

实际上有效，但是迭代两次，一次在池中，一次写入dict（更不用说笨重的元组解包）。

Answer 1

这是可能的，您只需要切换到使用延迟映射函数（不是map或starmap，它们必须先完成计算所有结果才能开始使用其中任何一个）：

from functools import partial
from itertools import tee

manager = multiprocessing.Manager()
graph_pool = manager.dict(graph)
pool = multiprocessing.Pool()

# Since you're processing in order and in parallel, tee might help a little
# by only generating the dict keys/search arguments once. That said, 
# combinations of n choose 2 are fairly cheap; the overhead of tee's caching
# might overwhelm the cost of just generating the combinations twice
startgoals1, startgoals2 = tee(itertools.combinations(graph, 2))

# Use partial binding of search with graph_pool to be able to use imap
# without a wrapper function; using imap lets us consume results as they become
# available, so the tee-ed generators don't store too many temporaries
results = pool.imap(partial(search, graph_pool), startgoals2))

# Efficiently create the dict from the start/goal pairs and the results of the search
# This line is eager, so it won't complete until all the results are generated, but
# it will be consuming the results as they become available in parallel with
# calculating the results
all_solutions = dict(zip(startgoals1, results))

将池结果添加到dict

1 个答案: