我正在研究一个网络抓取机器人,它需要非常快速地返回所有信息。我的主类Whole
生成一个Query对象列表。这是我的查询类:
class Query: #each query has search term and thing(s) to check the commonality of.
def __init__(self, query, terms):
assert type(query)==str
self.query = query
self.terms = terms
self.response = None
def visible(self,element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
elif re.match(r'<!--.*-->', str(element.encode('utf-8'))):
return False
return True
def processResponse(self, loop):
self.texts = None
async def fetch(url, session):
async with session.get(url) as response:
return await response.read()
async def bound_fetch(sem, url, session):
# Getter function with semaphore.
async with sem:
await fetch(url, session)
async def run(pages):
tasks = []
sem = asyncio.Semaphore(100)
# Fetch all responses within one Client session,
# keep connection alive for all requests.
async with aiohttp.ClientSession() as session:
for page in pages:
task = asyncio.ensure_future(bound_fetch(sem,page, session))
tasks.append(task)
responses = await asyncio.gather(*tasks)
self.texts = responses
# you now have all response bodies in this variable
pages = list([item['link'] for item in self.response['items']]) #all of the links to search
future = asyncio.ensure_future(run(pages))
每个&#34;查询&#34;有一个要搜索的页面列表以及要在这些页面上扫描的单词列表。 Whole
类包含多个Query
个对象的列表。我想同时执行所有Query
的所有必要请求,并将响应返回给每个单独的Query对象以进行进一步解析。我尝试创建两个事件循环,一个在Whole
中,另一个在Query
中,但后来我意识到我不能有多个事件循环。如何创建一个异步执行多个Query
的所有搜索的函数?在此先感谢您的帮助!
答案 0 :(得分:0)
如何创建一个异步执行多个
Query
的所有搜索的函数?
将processResponse
更改为async def
,并将其最后一行替换为await run(pages)
。然后在Whole
中等待asyncio.gather(*[q.processResponse() for q in queries])
,就像在processResponse
中一样。