Python ThreadPoolExecutor等待所有期货完成

时间:2016-09-19 07:06:30

标签: python multithreading concurrent.futures

我正在尝试编写一个需要同时/并行抓取某些URL的模块。因为这将是一个更昂贵的网络IO操作而不是CPU重。我正在使用ThreadPoolExecutor。

现在在我的代码中,多个函数将任务添加到共享线程池。

  

我的问题是主线程在所有未来对象之前被暂停   在回调函数中完成处理。

我是处理期货和ThreadPoolExecutor的初学者。任何帮助,将不胜感激。

import settings
from concurrent.futures import ThreadPoolExecutor
import concurrent.futures


class Test(Base):

    WORKER_THREADS = settings.WORKER_THREADS

    def __init__(self, urls):
        super(Test, self).__init__()
        self.urls = urls
        self.worker_pool = ThreadPoolExecutor(max_workers=Test.WORKER_THREADS)


    def add_to_worker_queue(self, task, callback, **kwargs):
        self.logger.info("Adding task %s to worker pool.", task.func_name)
        self.worker_pool.submit(task, **kwargs).add_done_callback(callback)
        return

    def load_url(self, url):
        response = self.make_requests(urls=url) # make_requests is in Base class (it just makes a HTTP req)
        # response is a generator, so to get the data out of it need to iterate through it.
        for res in response:
            return res

    def handle_response(self, response):
        # do some stuff with response and add it again to the worker queue for further parallel processing
        self.add_to_worker_queue(some_task, callback_func, data=response)
        return

    def start(self):
        for url in self.urls:
            self.add_to_worker_queue(self.load_url, self.handle_response, url=[url])
        return

    def stop(self):
        self.worker_pool.shutdown(wait=True)
        return


if __name__ == "__main__":
    start_urls = [ 'http://stackoverflow.com/'
                , 'https://docs.python.org/3.3/library/concurrent.futures.html'
                  ]
    test = Test(urls=start_urls)
    test.start()
    test.stop()

根据这个例子,我尝试使用带有“with”语句的执行器。 https://docs.python.org/3.3/library/concurrent.futures.html#threadpoolexecutor-example 但是当我逐个向池中提交任务时,上面的例子等待将来的对象完成,这会使我失去目的。

0 个答案:

没有答案