我正在尝试使用python中的多处理池实现二叉树的多线程构建。主要的想法是有N个硬件线程可用,在某个级别我想在单独的线程中创建树的每个分支。没有数据依赖关系,每个递归调用都适用于自己的数据,因此不存在与数据争用相关的问题。我熟悉GIL约束,因此我决定使用多处理池:
pool = Pool(processes=4)
tree = Tree.createTreeMT(points, pool, 2);
问题是我无法获得任何加速。创建树的函数如下所示:
def createTreeMT(points, pool, level = 2):
# If there are no more points to process
if len(points) < 1:
return
# Divide points into two groups:
left_points = ....
right_points = ....
tree = Tree()
if(level == 0):
if len(left_points) > 0:
leftPointsResult = pool.apply_async(createTree, (left_points))
if len(right_points) > 0:
rightPointsResult = pool.apply_async(createTree, (right_points))
if(leftPointsResult):
tree.left = leftPointsResult.get()
if(rightPointsResult):
tree.right = rightPointsResult.get()
else:
if len(left_points) > 0:
tree.left = createTreeMT(left_points, pool, level-1)
if len(right_points) > 0:
tree.right = createTreeMT(right_points, pool, level-1)
return tree
def createTree(points):
# If there are no more points to process
if len(points) < 1:
return
# Divide points into two groups:
left_points = ....
right_points = ....
tree = Tree()
if len(left_points) > 0:
tree.left = createTree(left_points)
if len(right_points) > 0:
tree.right = createTree(right_points)
return tree
我做错了吗?有没有更好的方法在标准python 2.7中执行此类任务?
答案 0 :(得分:0)
apply_async返回一个AsyncResult对象,当主进程调用&#34; get&#34;对它起作用,它将阻塞直到计算结果。因此,您的主进程始终阻止它正在创建的每个子任务。因此,没有加速!
一种选择是使用&#34;回调&#34; apply_async中的选项。所以你替换以下代码:
if(level == 0):
if len(left_points) > 0:
leftPointsResult = pool.apply_async(createTree, (left_points))
if len(right_points) > 0:
rightPointsResult = pool.apply_async(createTree, (right_points))
if(leftPointsResult):
tree.left = leftPointsResult.get()
if(rightPointsResult):
tree.right = rightPointsResult.get()
与
if(level == 0):
if len(left_points) > 0:
leftPointsResult = pool.apply_async(
createTree, (left_points),
callback=lambda x: tree.left=x)
if len(right_points) > 0:
rightPointsResult = pool.apply_async(
createTree, (right_points),
callback=lambda x: tree.right=x)
一旦结果准备就会调用回调lambda函数。正如您在代码中看到的那样,您的树将被并行正确填充,因为结果是由池进程计算的。