Question

我试图有效地将python循环并行化为n个线程。我对最好的方法有点困惑。另外一个问题是每个线程都需要写入一个字典（虽然从来没有相同的位置），并且每个线程都必须执行循环的24 / n次迭代（尽管我很确定大多数pyhon库会处理这个问题对我来说。）

代码（简化）：

n=<number of threads input by user>
mySets=[ str(x) for x in range(1,25) ]
myDict={}

// Start of parallelization
for set in mySets:

    //Performs actions on the set
    //Calls external c++ code on the set and gets a result back
    //processes the result
    myDict[set]=result

// End parallelization

// Process the results to output

我在unix环境中，但最佳的是它不会出现Windows或MAC问题。我的其余代码都是可移植的，我真的不想让它停止它。

我看到了这个帖子：Parallelize a loop in python 2.4 但我不认为fork是我想要的，因为我希望用户指定可用的节点数。

我还看了多处理库，我很确定这是我想要的，但似乎每个人都把他们的代码放到一个函数中 - 我想避免...它有很多代码它会很混乱。

我也看过joblib，但我不清楚它与多处理库之间的区别。一个与另一个的好处是什么。

感谢您的帮助！

Answer 1

您可以使用mutliprocessing.pool.Pool。

这是一些伪代码：

from multiprocessing.pool import Pool


def do_something(n, sets):
    out = dict()

    with Pool(processes=n) as pool:
        results = pool.map(cpp_computation_function, sets)
        for set, result in zip(sets, results):
            out[set] = result

    return out

尝试在写入字典{n}的n个节点上并行化python循环

1 个答案: