我的代码以这种方式工作,但是由于for循环,它的速度非常慢,您能帮我,使其与aiohttp
,asyncio
一起工作吗?
def field_info(field_link):
response = requests.get(field_link)
soup = BeautifulSoup(response.text, 'html.parser')
races = soup.findAll('header', {'class': 'dc-field-header'})
tables = soup.findAll('table', {'class': 'dc-field-comp'})
for i in range(len(races)):
race_name = races[i].find('h3').text
race_time = races[i].find('time').text
names = tables[i].findAll('span', {'class': 'title'})
trainers = tables[i].findAll('span', {'class': 'trainer'})
table = []
for j in range(len(names)):
table.append({
'Name': names[j].text,
'Trainer': trainers[j].text,
})
return {
'RaceName': race_name,
'RaceTime': race_time,
'Table': table
}
links = [link1, link2, link3]
for link in links:
scraped_info += field_info(link)
答案 0 :(得分:2)
1):创建协程以异步发出请求:
import asyncio
import aiohttp
async def get_text(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
return await resp.text()
2)将所有同步请求替换为等待此协程,从而使外部函数也成为协程:
async def field_info(field_link): # async - to make outer function coroutine
text = await get_text(field_link) # await - to get result from async funcion
soup = BeautifulSoup(text, 'html.parser')
3)使外部代码使用asyncio.gather()
并发执行工作:
async def main():
links = [link1, link2, link3]
scraped_info = asyncio.gather(*[
field_info(link)
for link
in links
]) # do multiple field_info coroutines concurrently (parallely)
4)将顶级协程传递到asyncio.run()
:
asyncio.run(main())