我可以在Future中使用ProcessPoolExecutor吗?

时间:2014-02-22 21:02:42

标签: python python-3.x concurrency multiprocessing concurrent.futures

我有一个列表的程序。对于此列表中的每个值,它将检索另一个列表并处理此其他列表。

基本上,它是一个3深度的树,需要在每个节点上进行昂贵的处理。

每个节点都需要能够处理其子节点的结果。

我希望能够从第一层map的输入list到每个节点的结果。但是,在每个过程中,我希望map下一层的结果。

我担心的是每一层都有自己的最大工人数。如果可能的话,我希望他们共享一个进程池,否则所有进程切换都会有性能命中。

有没有办法使用concurrency.futures或其他方法让每个图层共享同一个进程池?

一个例子是:

def main():
    my_list = [1,2,3,4]
    with concurrent.futures.ProcessPoolExecutor(max_workers = 4) as executor:
        results = executor.map(my_function, zip(my_list, [executor] * len(my_list)))
        #process results

def my_function(args):
    list = args[0]
    executor = args[1]
    new_list = process(list)
    results = executor.map(second_function, new_list)
    #process results
    #return processed results

def second_function(values):
    ...

通过这种方式,每个子进程都将从同一个池中进行绘制。

或者,我可以做某些事情(但不完全是)

import concurrent.futures.ProcessPoolExecutor(max_workers = 4) as executor

并且每次调用executor都来自同一个进程池?

1 个答案:

答案 0 :(得分:1)

问题是你的进程池有4个线程,你试图在20个线程中等待..所以没有足够的线程来做你想要的。

换句话说:my_function在工作线程中执行。调用map时,此线程会阻塞。有一个线程可用于执行映射调用。期货阻止这个线程。

我的解决方案是使用返回期货的yieldyield from语句。所以我的解决方案是删除期货和线程的阻止。所有期货都会被创建,然后产生收益,以便中断当前的执行并释放线程。然后该线程可以执行地图期货。 Onc未来完成注册的callbac执行next()生成器步骤。

要解决代理问题以解决对象问题,必须首先解决这个问题:How to properly set up multiprocessing proxy objects for objects that already exist

所以我们有以下递归来执行:[1,[2,[3,3,3],2],1],0,0]列表的递归并行和。

我们可以期待以下输出:

tasks: [[1, [2, [3, 3, 3], 2], 1], 0, 0]
tasks: [1, [2, [3, 3, 3], 2], 1]
tasks: 0
tasks: 0
tasks: 1
tasks: [2, [3, 3, 3], 2]
tasks: 1
tasks: 2
tasks: [3, 3, 3]
tasks: 2
tasks: 3
tasks: 3
tasks: 3
v: 15

此处的代码介绍了一个启用了递归的ThreadPoolExecutor:

import traceback
from concurrent.futures.thread import *
from concurrent.futures import *
from concurrent.futures._base import *
##import hanging_threads

class RecursiveThreadPoolExecutor(ThreadPoolExecutor):

    # updated version here: https://gist.github.com/niccokunzmann/9170072

    def _submit(self, fn, *args, **kwargs):
        return ThreadPoolExecutor.submit(self, fn, *args, **kwargs)

    def submit(self, fn, *args, **kwargs):
        """Submits a callable to be executed with the given arguments.

        Schedules the callable to be executed as fn(*args, **kwargs) and returns
        a Future instance representing the execution of the callable.

        Returns:
            A Future representing the given call.
        """
        real_future = Future()
        def generator_start():
            try:
##                print('start', fn, args, kwargs)
                generator = fn(*args, **kwargs)
##                print('generator:', generator)
                def generator_next():
                    try:
##                        print('next')
                        try:
                            future = next(generator)
                        except StopIteration as stop:
                            real_future.set_result(stop.args[0])
                        else:
                            if future is None:
                                self._submit(generator_next)
                            else:
                                future.add_done_callback(lambda future: generator_next())
                    except:
                        traceback.print_exc()
                self._submit(generator_next)
##                print('next submitted 1')
            except:
                traceback.print_exc()
        self._submit(generator_start)
        return real_future

    def recursive_map(self, fn, *iterables, timeout=None):
        """Returns a iterator equivalent to map(fn, iter).

        Args:
            fn: A callable that will take as many arguments as there are
                passed iterables.
            timeout: The maximum number of seconds to wait. If None, then there
                is no limit on the wait time.

        Returns:
            An iterator equivalent to: map(func, *iterables) but the calls may
            be evaluated out-of-order.

        Raises:
            TimeoutError: If the entire result iterator could not be generated
                before the given timeout.
            Exception: If fn(*args) raises for any values.
        """
        if timeout is not None:
            end_time = timeout + time.time()

        fs = [self.submit(fn, *args) for args in zip(*iterables)]

        # Yield must be hidden in closure so that the futures are submitted
        # before the first iterator value is required.
        def result_iterator():
            yield from fs
            return fs
        return result_iterator()

if __name__ == '__main__':

    def f(args):
        executor, tasks = args
        print ('tasks:', tasks)
        if type(tasks) == int:
            return tasks
        # waiting for all futures without blocking the thread
        futures = yield from executor.recursive_map(f, [(executor, task) for task in tasks]) 
        return sum([future.result() for future in futures])

    with RecursiveThreadPoolExecutor(max_workers = 1) as executor:
        r = executor.map(f, [(executor, [[1,[2,[3,3,3],2],1],0,0],)] * 1)
        import time
        time.sleep(0.1)

        for v in r:
            print('v: {}'.format(v))

可在此处找到更新版本:https://gist.github.com/niccokunzmann/9170072

可悲的是,我现在无法使用某些多处理功能为进程实现此功能。你可以做到这一点,唯一应该是必需的是为generator_startgenerator_next函数创建一个代理对象。如果你这样做,请告诉我。

要解决方法的代理问题,我们也会回答这个问题:How to properly set up multiprocessing proxy objects for objects that already exist