python中异步循环内的异步循环

时间:2018-04-09 03:01:38

标签: python asynchronous python-requests python-3.5 python-asyncio

我正在研究一个网络抓取机器人,它需要非常快速地返回所有信息。我的主类Whole生成一个Query对象列表。这是我的查询类:

class Query: #each query has search term and thing(s) to check the commonality of.
    def __init__(self, query, terms):
        assert type(query)==str
        self.query = query
        self.terms = terms
        self.response = None

    def visible(self,element):
        if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
            return False
        elif re.match(r'<!--.*-->', str(element.encode('utf-8'))):
            return False
        return True

    def processResponse(self, loop):
        self.texts = None
        async def fetch(url, session):
            async with session.get(url) as response:
                return await response.read()

        async def bound_fetch(sem, url, session):
            # Getter function with semaphore.
            async with sem:
                await fetch(url, session)

        async def run(pages):
            tasks = []
            sem = asyncio.Semaphore(100)

            # Fetch all responses within one Client session,
            # keep connection alive for all requests.
            async with aiohttp.ClientSession() as session:
                for page in pages:
                    task = asyncio.ensure_future(bound_fetch(sem,page, session))
                    tasks.append(task)

                responses = await asyncio.gather(*tasks)
                self.texts = responses

                # you now have all response bodies in this variable
        pages = list([item['link'] for item in self.response['items']])  #all of the links to search
        future = asyncio.ensure_future(run(pages))

每个&#34;查询&#34;有一个要搜索的页面列表以及要在这些页面上扫描的单词列表。 Whole类包含多个Query个对象的列表。我想同时执行所有Query的所有必要请求,并将响应返回给每个单独的Query对象以进行进一步解析。我尝试创建两个事件循环,一个在Whole中,另一个在Query中,但后来我意识到我不能有多个事件循环。如何创建一个异步执行多个Query的所有搜索的函数?在此先感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

  

如何创建一个异步执行多个Query的所有搜索的函数?

processResponse更改为async def,并将其最后一行替换为await run(pages)。然后在Whole中等待asyncio.gather(*[q.processResponse() for q in queries]),就像在processResponse中一样。