Python线程池 - 创建子任务并等待它们的任务

时间:2015-05-24 18:46:39

标签: python multithreading concurrency threadpool concurrent.futures

假设我有一个最大的线程池执行程序。 10个线程,我向它提交一个任务,它自己创建另一个任务,然后等待它完成,递归,直到我达到11的深度。

Python中的示例代码:

import concurrent.futures

e = concurrent.futures.ThreadPoolExecutor(max_workers=10)

def task(depth):
    print 'started depth %d' % (depth, )
    if depth > 10:
        return depth
    else:
        f = e.submit(task, depth + 1)
        concurrent.futures.wait([f])


f = e.submit(task, 0)
print f.result()

以上代码输出:

started depth 0
started depth 1
started depth 2
started depth 3
started depth 4
started depth 5
started depth 6
started depth 7
started depth 8
started depth 9

和死锁。

有没有办法在不创建额外线程和执行器的情况下解决这个问题?

换句话说,工作线程在等待时处理其他任务的方法是什么?

3 个答案:

答案 0 :(得分:3)

Using coroutines your code could be rewritten as:

import asyncio

@asyncio.coroutine
def task(depth):
    print('started depth %d' % (depth, ))
    if depth > 10:
        return depth
    else:
        # create new task
        t = asyncio.async(task(depth + 1))
        # wait for task to complete
        yield from t
        # get the result of the task
        return t.result()

loop = asyncio.get_event_loop()
result = loop.run_until_complete(task(1))
print(result)
loop.close()

However, I'm struggling to see why you need all this extra code. In your example code you always wait directly for the result of the task, thus your code would run no different without the executor. For example, the following would produce the same result

def task(depth):
    print 'started depth %d' % (depth, )
    if depth > 10:
        return depth
    else:
        task(depth + 1)

I think this example from the documentation better shows how async coroutines are able to parallelise tasks. This example creates 3 tasks, each of which computes a different factorial. Notice how when each task yields to another coroutine (in this case async.sleep), another task is allowed to continue its execution.

import asyncio

@asyncio.coroutine
def factorial(name, number):
    f = 1
    for i in range(2, number+1):
        print("Task %s: Compute factorial(%s)..." % (name, i))
        yield from asyncio.sleep(1)
        f *= i
    print("Task %s: factorial(%s) = %s" % (name, number, f))

loop = asyncio.get_event_loop()
tasks = [
    asyncio.ensure_future(factorial("A", 2)),
    asyncio.ensure_future(factorial("B", 3)),
    asyncio.ensure_future(factorial("C", 4))]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

Output:

Task A: Compute factorial(2)...
Task B: Compute factorial(2)...
Task C: Compute factorial(2)...
Task A: factorial(2) = 2
Task B: Compute factorial(3)...
Task C: Compute factorial(3)...
Task B: factorial(3) = 6
Task C: Compute factorial(4)...
Task C: factorial(4) = 24

答案 1 :(得分:1)

不,如果你想避免死锁,你就不能在任务中从同一个执行者等待未来。

在这个例子中你唯一能做的就是返回未来,然后递归处理结果:

import concurrent.futures
import time

e = concurrent.futures.ThreadPoolExecutor(max_workers=10)

def task(depth):
    print 'started depth %d' % (depth, )
    if depth > 10:
        return depth
    else:
        f = e.submit(task, depth + 1)
        return f


f = e.submit(task, 0)
while isinstance(f.result(), concurrent.futures.Future):
    f = f.result()

print f.result()

然而,最好首先避免这种递归执行。

答案 2 :(得分:0)

您在这里遇到的是您已正确称为deadlock的内容。启动下一个线程并等待它的第一个线程持有一个lock,所有后续任务都会在等待释放相同的lock时死锁(在您的情况下永远不会这样)。我建议您在任务中启动自己的线程而不是使用池,例如:

import concurrent.futures
import threading


class TaskWrapper(threading.Thread):

    def __init__(self, depth, *args, **kwargs):
        self._depth = depth
        self._result = None
        super(TaskWrapper, self).__init__(*args, **kwargs)

    def run(self):
        self._result = task(self._depth)

    def get(self):
        self.join()
        return self._result

e = concurrent.futures.ThreadPoolExecutor(max_workers=10)


def task(depth):
    print 'started depth %d' % (depth, )
    if depth > 10:
        return depth
    else:
        t = TaskWrapper(depth + 1)
        t.start()
        return t.get()

f = e.submit(task, 0)
print f.result()