Question

我有一个二维函数，我想计算网格点上函数的元素，但行和列上的两个循环非常慢，我想使用multiprocessing来提高速度。码。我编写了以下代码来做两个循环：

from multiprocessing import Pool

#Grid points
ra = np.linspace(25.1446, 25.7329, 1000)
dec = np.linspace(-10.477, -9.889, 1000)
#The 2D function
def like2d(x,y): 
    stuff=[RaDec, beta, rho_c_over_sigma_c, zhalo, rho_crit]
    m=3e14
    c=7.455
    param=[x, y, m, c]
    return reduced_shear( param, stuff, observed_g, g_err)

pool = Pool(processes=12)

def data_stream(a, b):
    for i, av in enumerate(a):
        for j, bv in enumerate(b):
            yield (i, j), (av, bv)

def myfunc(args):
    return args[0], like2d(*args[1])

counter,likelihood = pool.map(myfunc, data_stream(ra, dec))

但是我收到以下错误消息：

处理PoolWorker-1：

Traceback (most recent call last):
  File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/user/anaconda/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/user/anaconda/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: 'module' object has no attribute 'myfunc'
Process PoolWorker-2:
Traceback (most recent call last):
  File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/user/anaconda/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/user/anaconda/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: 'module' object has no attribute 'myfunc'
Process PoolWorker-3:
Traceback (most recent call last):
  File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/user/anaconda/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/user/anaconda/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: 'module' object has no attribute 'myfunc'
Process PoolWorker-4:

一切都已定义，我不明白为什么会出现这个错误消息!!任何人都可以指出可能出现的问题吗？

使用multiprocessing进行循环的另一种方法，并将结果保存在二维数组中：

#Grid points
ra = np.linspace(25.1446, 25.7329, 1000)
dec = np.linspace(-10.477, -9.889, 1000)

#The 2D function
def like2d(x,y):
    stuff=[RaDec, beta, rho_c_over_sigma_c, zhalo, rho_crit]
    m=3e14
    c=7.455
    param=[x, y, m, c]
    return reduced_shear( param, stuff, observed_g, g_err)


shared_array_base = multiprocessing.Array(ctypes.c_double, ra.shape[0]*dec.shape[0])
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape( ra.shape[0],dec.shape[0])

# Parallel processing
def my_func(i, def_param=shared_array):
    shared_array[i,:] = np.array([float(like2d(ra[j],dec[i])) for j in range(ra.shape[0])])

print "processing to estimate likelihood in 2D grids......!!!"
start = time.time()
pool = multiprocessing.Pool(processes=12)
pool.map(my_func, range(dec.shape[0]))
print shared_array
end = time.time()
print end - start

Answer 1

您必须在worker函数（Pool）定义之后创建myfunc。创建Pool会导致Python在此时分叉您的工作进程，并且将在子代中定义的唯一内容是Pool定义上面定义的函数。此外，map将返回元组列表（每个对象yield由data_stream编辑一个），而不是单个元组。所以你需要这个：

from multiprocessing import Pool

#Grid points
ra = np.linspace(25.1446, 25.7329, 1000)
dec = np.linspace(-10.477, -9.889, 1000)
#The 2D function
def like2d(x,y): 
    stuff=[RaDec, beta, rho_c_over_sigma_c, zhalo, rho_crit]
    m=3e14
    c=7.455
    param=[x, y, m, c]
    return reduced_shear( param, stuff, observed_g, g_err)


def data_stream(a, b):
    for i, av in enumerate(a):
        for j, bv in enumerate(b):
            yield (i, j), (av, bv)

def myfunc(args):
    return args[0], like2d(*args[1])

if __name__ == "__main__":    
    pool = Pool(processes=12)
    results = pool.map(myfunc, data_stream(ra, dec))  # results is a list of tuples.
    for counter,likelihood in results:
        print("counter: {}, likelihood: {}".format(counter, likelihood))

我添加了if __name__ == "__main__":后卫，这在POSIX平台上是不必要的，但在Windows上是必要的（它不支持os.fork()）。

在2D函数的网格点上进行多处理

1 个答案: