Question

我正在使用concurrent.futures包中的ThreadPoolExecutor类

def some_func(arg):
    # does some heavy lifting
    # outputs some results

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=1) as executor:
    for arg in range(10000000):
        future = executor.submit(some_func, arg)

但是我需要以某种方式限制队列大小，因为我不想一次创建数百万个期货，是否有一种简单的方法可以做到这一点，或者我应该坚持使用queue.Queue和threading包来实现这一点？

Answer 1

from concurrent.futures import ThreadPoolExecutor, wait, FIRST_COMPLETED

limit = 10

futures = set()

with ThreadPoolExecutor(max_workers=1) as executor:
    for arg in range(10000000):
        if len(futures) >= limit:
            completed, futures = wait(futures, return_when=FIRST_COMPLETED)
        futures.add(executor.submit(some_func, arg))

Answer 2

Python的ThreadPoolExecutor没有您正在寻找的功能，但提供的类可以轻松地按以下类别进行分类：

class ThreadPoolExecutorWithQueueSizeLimit(futures.ThreadPoolExecutor):
def __init__(self, maxsize=50, *args, **kwargs):
    super(ThreadPoolExecutorWithQueueSizeLimit, self).__init__(*args, **kwargs)
    self._work_queue = Queue.Queue(maxsize=maxsize)

Answer 3

我通过分组范围来做到这一点。这是一个有效的例子。

from time import time, strftime, sleep, gmtime
from random import randint
from itertools import islice
from concurrent.futures import ThreadPoolExecutor, as_completed

def nap(id, nap_length):
    sleep(nap_length)
    return nap_length


def chunked_iterable(iterable, chunk_size):
    it = iter(iterable)
    while True:
        chunk = tuple(islice(it, chunk_size))
        if not chunk:
            break
        yield chunk


if __name__ == '__main__':
    startTime = time()

    range_size = 10000000
    chunk_size = 10
    nap_time = 2

    # Iterate in chunks.
    # This consumes less memory and kicks back initial results sooner.
    for chunk in chunked_iterable(range(range_size), chunk_size):

        with ThreadPoolExecutor(max_workers=chunk_size) as pool_executor:
            pool = {}
            for i in chunk:
                function_call = pool_executor.submit(nap, i, nap_time)
                pool[function_call] = i

            for completed_function in as_completed(pool):
                result = completed_function.result()
                i = pool[completed_function]

                print('{} completed @ {} and slept for {}'.format(
                    str(i + 1).zfill(4),
                    strftime("%H:%M:%S", gmtime()),
                    result))

    print('==--- Script took {} seconds. ---=='.format(
        round(time() - startTime)))

这种方法的缺点是块是同步的。在将下一个块添加到池中之前，块中的所有线程都必须完成。

Answer 4

您应该使用类似https://www.bettercodebytes.com/theadpoolexecutor-with-a-bounded-queue-in-python/的信号灯

andres.riancho的答案出了点问题，如果我们设置队列的max_size，则在关闭池时，self._work_queue.put（None）不能受到max_size的限制，因此我们的民意调查将永远不会存在。

    def shutdown(self, wait=True):
        with self._shutdown_lock:
            self._shutdown = True
            self._work_queue.put(None)
        if wait:
            for t in self._threads:
                t.join(sys.maxint)

Answer 5

我试图编辑接受的答案，以便它实际运行，但由于某种原因被拒绝。但是，这里是已接受答案的工作/更简单版本（更正缩进，将 Queue.Queue 更正为 queue.Queue，简化了不必要的冗长超级调用，添加了导入）：

from concurrent import futures
import queue

class ThreadPoolExecutorWithQueueSizeLimit(futures.ThreadPoolExecutor):
    def __init__(self, maxsize=50, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._work_queue = queue.Queue(maxsize=maxsize)

ThreadPoolExecutor：如何限制队列maxsize？

5 个答案: