Question

我对并行化完全不熟悉。我想并行化一个嵌套的for循环并存储一些中间结果。结果来自一个函数f，它接受一些形式参数和一些来自全局变量的值。我从here得到了一些建议，例如我使用itertools生成一个笛卡尔积，它相当于一个嵌套循环。但它似乎不起作用。我想存储中间结果的数组保持不变。附上一个最小的工作示例。

操作系统：Windows 7 64位

Python发布：Canopy Enthought

import itertools
import numpy as np
from multiprocessing import Pool

list1 = range(4, 8)
list2 = range(6, 9)
ary = np.zeros( (len(list1), len(list2)) )

#This is the archetypical function f. It DOES NOT have p2 as a parameter! This
#is intended! In my (more complex) program a function f calls somewhere deep
#down another function that gets its values from global variables. Rewriting
#the code to hand down the variables as parameters would turn my code into a mess.
def f(p1):
    return p1*p2

#This is what I want to parallelize: a nested loop, where the result of f is saved
#in an array element corresponding to the indices of p1 and p2.
#for p1 in list1:
#    for p2 in list2:
#        i = list1.index(p1)
#        j = list2.index(p2)
#        ary[i,j]=f(p1)

#Here begins the try to parallelize the nested loop. The function g calls f and
#does the saving of the results. g takes a tuple x, unpacks it, then calculates
#f and saves the result in an array.
def g(x):
    a, b = x
    i = list1.index(a)
    j = list2.index(b)
    global p2
    p2 = b
    ary[i,j] = f(a)

if __name__ == "__main__":
    #Produces a cartesian product. This is equivalent to a nested loop.
    it = itertools.product(list1, list2)
    pool = Pool(processes=2)
    result = pool.map(g, it)
    print ary
    #Result: ary does not change!

Answer 1

通过使用Pool，您的程序以某种方式复制了进程次数，每个进程都有自己的全局变量。计算返回时，主进程的全局变量不会更改。您应该使用函数的返回值，并行调用并合并结果，这意味着，使用行中的变量result

result = pool.map(g, it)

在您的情况下，它目前只包含None的列表。

并行化的一般提示：始终使用纯计算，这意味着不要依赖全局变量等副作用。

Answer 2

您需要使用某种机制在进程之间共享信息。例如，查看multiprocessing.queue。

如果要使用共享内存，则需要使用线程。您可能会发现虽然GIL确实会影响线程性能，但您仍然可以并行运行numpy命令。

尝试并行化嵌套的for循环并保存中间结果

2 个答案: