将池结果添加到dict

时间:2018-02-14 01:03:40

标签: python dictionary multiprocessing

我有一个函数接受itertools组合提供的两个输入,并输出一个解决方案。两个输入应存储为在dict中形成键的元组,而结果是值。

我可以汇集这个并将所有结果作为列表获取,然后我可以逐个插入到字典中,但这似乎效率低下。有没有办法在每个作业完成后获得结果,并直接将其添加到词典?

基本上,我有以下代码:

all_solutions = {}
for start, goal in itertools.combinations(graph, 2):
    all_solutions[(start, goal)] = search(graph, start, goal)

我正在尝试将其并行化如下:

all_solutions = {}
manager = multiprocessing.Manager()
graph_pool = manager.dict(graph)
pool = multiprocessing.Pool()
results = pool.starmap(search, zip(itertools.repeat(graph_pool),
                                   itertools.combinations(graph, 2)))
for i, start_goal in enumerate(itertools.combinations(graph, 2)):
    start, goal = start_goal[0], start_goal[1]
    all_solutions[(start, goal)] = results[i]

实际上有效,但是迭代两次,一次在池中,一次写入dict(更不用说笨重的元组解包)。

1 个答案:

答案 0 :(得分:1)

这是可能的,您只需要切换到使用延迟映射函数(不是mapstarmap,它们必须先完成计算所有结果才能开始使用其中任何一个):

from functools import partial
from itertools import tee

manager = multiprocessing.Manager()
graph_pool = manager.dict(graph)
pool = multiprocessing.Pool()

# Since you're processing in order and in parallel, tee might help a little
# by only generating the dict keys/search arguments once. That said, 
# combinations of n choose 2 are fairly cheap; the overhead of tee's caching
# might overwhelm the cost of just generating the combinations twice
startgoals1, startgoals2 = tee(itertools.combinations(graph, 2))

# Use partial binding of search with graph_pool to be able to use imap
# without a wrapper function; using imap lets us consume results as they become
# available, so the tee-ed generators don't store too many temporaries
results = pool.imap(partial(search, graph_pool), startgoals2))

# Efficiently create the dict from the start/goal pairs and the results of the search
# This line is eager, so it won't complete until all the results are generated, but
# it will be consuming the results as they become available in parallel with
# calculating the results
all_solutions = dict(zip(startgoals1, results))