Question

我正在尝试编写一个需要同时/并行抓取某些URL的模块。因为这将是一个更昂贵的网络IO操作而不是CPU重。我正在使用ThreadPoolExecutor。

现在在我的代码中，多个函数将任务添加到共享线程池。

我的问题是主线程在所有未来对象之前被暂停在回调函数中完成处理。

我是处理期货和ThreadPoolExecutor的初学者。任何帮助，将不胜感激。

import settings
from concurrent.futures import ThreadPoolExecutor
import concurrent.futures


class Test(Base):

    WORKER_THREADS = settings.WORKER_THREADS

    def __init__(self, urls):
        super(Test, self).__init__()
        self.urls = urls
        self.worker_pool = ThreadPoolExecutor(max_workers=Test.WORKER_THREADS)


    def add_to_worker_queue(self, task, callback, **kwargs):
        self.logger.info("Adding task %s to worker pool.", task.func_name)
        self.worker_pool.submit(task, **kwargs).add_done_callback(callback)
        return

    def load_url(self, url):
        response = self.make_requests(urls=url) # make_requests is in Base class (it just makes a HTTP req)
        # response is a generator, so to get the data out of it need to iterate through it.
        for res in response:
            return res

    def handle_response(self, response):
        # do some stuff with response and add it again to the worker queue for further parallel processing
        self.add_to_worker_queue(some_task, callback_func, data=response)
        return

    def start(self):
        for url in self.urls:
            self.add_to_worker_queue(self.load_url, self.handle_response, url=[url])
        return

    def stop(self):
        self.worker_pool.shutdown(wait=True)
        return


if __name__ == "__main__":
    start_urls = [ 'http://stackoverflow.com/'
                , 'https://docs.python.org/3.3/library/concurrent.futures.html'
                  ]
    test = Test(urls=start_urls)
    test.start()
    test.stop()

根据这个例子，我尝试使用带有“with”语句的执行器。 https://docs.python.org/3.3/library/concurrent.futures.html#threadpoolexecutor-example 但是当我逐个向池中提交任务时，上面的例子等待将来的对象完成，这会使我失去目的。

Python ThreadPoolExecutor等待所有期货完成

0 个答案: