如何在Python中并行化递归函数?
我的功能如下:
def f(x, depth):
if x==0:
return ...
else :
return [x] + map(lambda x:f(x, depth-1), list_of_values(x))
def list_of_values(x):
# heavy compute, pure function
尝试将其与multiprocessing.Pool.map
并行化时,Windows会打开无数个进程并挂起。
将它并行化(对于单个多核机器)有什么好处(最好是简单的)方式?
以下是挂起的代码:
from multiprocessing import Pool
pool = pool(processes=4)
def f(x, depth):
if x==0:
return ...
else :
return [x] + pool.map(lambda x:f(x, depth-1), list_of_values(x))
def list_of_values(x):
# heavy compute, pure function
答案 0 :(得分:5)
好的,对不起这个问题。
我将回答一个稍微不同的问题,其中f()
返回列表中值的总和。这是因为我从你的例子中不清楚f()
的返回类型是什么,并且使用整数使代码易于理解。
这很复杂,因为并行发生了两件不同的事情:
f()
我非常小心只使用池来计算昂贵的功能。通过这种方式,我们不会受到进程的“爆炸”。但是因为这是异步的,所以我们需要推迟一个批次的工作,一旦昂贵的函数完成,工作者就会调用它。
更重要的是,我们需要使用倒计时锁存器,以便我们知道所有单独的f()
子调用何时完成。
可能有一种更简单的方式(我很确定有,但我需要做其他事情),但也许这会让你知道什么是可能的:
from multiprocessing import Pool, Value, RawArray, RLock
from time import sleep
class Latch:
'''A countdown latch that lets us wait for a job of "n" parts'''
def __init__(self, n):
self.__counter = Value('i', n)
self.__lock = RLock()
def decrement(self):
with self.__lock:
self.__counter.value -= 1
print('dec', self.read())
return self.read() == 0
def read(self):
with self.__lock:
return self.__counter.value
def join(self):
while self.read():
sleep(1)
def list_of_values(x):
'''An expensive function'''
print(x, ': thinking...')
sleep(1)
print(x, ': thought')
return list(range(x))
pool = Pool()
def async_f(x, on_complete=None):
'''Return the sum of the values in the expensive list'''
if x == 0:
on_complete(0) # no list, return 0
else:
n = x # need to know size of result beforehand
latch = Latch(n) # wait for n entires to be calculated
result = RawArray('i', n+1) # where we will assemble the map
def delayed_map(values):
'''This is the callback for the pool async process - it runs
in a separate thread within this process once the
expensive list has been calculated and orchestrates the
mapping of f over the result.'''
result[0] = x # first value in list is x
for (v, i) in enumerate(values):
def callback(fx, i=i):
'''This is the callback passed to f() and is called when
the function completes. If it is the last of all the
calls in the map then it calls on_complete() (ie another
instance of this function) for the calling f().'''
result[i+1] = fx
if latch.decrement(): # have completed list
# at this point result contains [x]+map(f, ...)
on_complete(sum(result)) # so return sum
async_f(v, callback)
# Ask worker to generate list then call delayed_map
pool.apply_async(list_of_values, [x], callback=delayed_map)
def run():
'''Tie into the same mechanism as above, for the final value.'''
result = Value('i')
latch = Latch(1)
def final_callback(value):
result.value = value
latch.decrement()
async_f(6, final_callback)
latch.join() # wait for everything to complete
return result.value
print(run())
ps我正在使用python3.2并且上面的丑陋是因为我们推迟了最终结果的计算(回到树上)直到以后。可能像发电机或期货这样的东西可以简化事情。
另外,我怀疑你需要一个缓存,以避免在使用与之前相同的参数调用时不必要地重新计算昂贵的函数。
另见yaniv的回答 - parallel recursive function in python? - 这似乎是通过明确深度来反转评估顺序的另一种方法。
答案 1 :(得分:2)
在考虑了这个之后,我找到了一个简单但不完整但又足够好的答案:
# a partially parallel solution , just do the first level of recursion in paralell. it might be enough work to fill all cores.
import multiprocessing
def f_helper(data):
return f(x=data['x'],depth=data['depth'], recursion_depth=data['recursion_depth'])
def f(x, depth, recursion_depth):
if depth==0:
return ...
else :
if recursion_depth == 0:
pool = multiprocessing.Pool(processes=4)
result = [x] + pool.map(f_helper, [{'x':_x, 'depth':depth-1, 'recursion_depth':recursion_depth+1 } _x in list_of_values(x)])
pool.close()
else:
result = [x] + map(f_helper, [{'x':_x, 'depth':depth-1, 'recursion_depth':recursion_depth+1 } _x in list_of_values(x)])
return result
def list_of_values(x):
# heavy compute, pure function
答案 2 :(得分:0)
我最初存储主进程ID,并将其传输到子程序。
当我需要开始多处理作业时,我检查主进程的子进程数。如果它小于或等于我的CPU计数的一半,那么我将其并行运行。如果它大于我的CPU计数的一半,那么我将以串行方式运行它。这样,它可以避免瓶颈并有效地使用cpu核心。您可以根据情况调整内核数。例如,您可以将其设置为确切的cpu核心数,但不应超过该数量。
def subProgramhWrapper(func, args):
func(*args)
parent = psutil.Process(main_process_id)
children = parent.children(recursive=True)
num_cores = int(multiprocessing.cpu_count()/2)
if num_cores >= len(children):
#parallel run
pool = MyPool(num_cores)
results = pool.starmap(subProgram, input_params)
pool.close()
pool.join()
else:
#serial run
for input_param in input_params:
subProgramhWrapper(subProgram, input_param)