线程化时出现asyncio aiohttp错误

时间:2018-08-19 21:22:34

标签: python multithreading python-asyncio

我有一个为我编写的脚本,我无法执行它...我收到以下错误...

回溯(最近通话最近一次):

  

文件“ crawler.py”,第56行,在       loop.run_until_complete(未来)文件“ C:\ Users \ lisa \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ asyncio \ base_events.py”,   第568行,在run_until_complete中       在运行中返回future.result()文件“ crawler.py”,第51行       等待响应,bound_fetch中的文件“ crawler.py”,第32行       等待获取(URL,会话)文件“ crawler.py”,第22行,在获取中       与session.get(url,headers = headers)异步作为响应:文件“ C:\ Users \ lisa \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ site-packages \ aiohttp \ client.py”,   第843行,位于 enter       self._resp =等待self._coro文件“ C:\ Users \ lisa \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ site-packages \ aiohttp \ client.py”,   _request中的第387行       等待resp.start(conn)文件“ C:\ Users \ lisa \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ site-packages \ aiohttp \ client_reqrep.py”,   第748行,开始时       消息,有效负载=等待self._protocol.read()文件“ C:\ Users \ lisa \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ site-packages \ aiohttp \ streams.py”,   第533行,处于读取状态       等待self._waiter aiohttp.client_exceptions.ServerDisconnectedError:无

我明显缺少什么吗?我可以在不使用线程的情况下运行相同的脚本,谢谢...

import random
import asyncio
from aiohttp import ClientSession
import requests
from itertools import product
from string import *
from multiprocessing import Pool
from itertools import islice
import sys


headers = {'User-Agent': 'Mozilla/5.0'}

letter = sys.argv[1]
number = int(sys.argv[2])

first_group = product(ascii_lowercase, repeat=2)
second_group = product(digits, repeat=3)
codeList = [''.join([''.join(k) for k in prod]) for prod in product([letter], first_group, second_group)]

async def fetch(url, session):
    async with session.get(url, headers=headers) as response:
        statusCode = response.status
        if(statusCode == 200):
            print("{} statusCode is {}".format(url, statusCode))
        return await response.read()


async def bound_fetch(sem, url, session):
    async with sem:
        await fetch(url, session)

def getUrl(codeIdex):
    return "https://www.blahblah.com/" + codeList[codeIdex] + ".png"

async def run(r):
    tasks = []
    sem = asyncio.Semaphore(1000)

    async with ClientSession() as session:
        for i in range(r):
            task = asyncio.ensure_future(bound_fetch(sem, getUrl(i), session))
            tasks.append(task)

        responses = asyncio.gather(*tasks)
        await responses

loop = asyncio.get_event_loop()

future = asyncio.ensure_future(run(number))
loop.run_until_complete(future)

1 个答案:

答案 0 :(得分:0)

我不能留下评论,所以我在这里写。尝试为信号量限制设置一个较小的数字,我认为问题取决于您同时向网站提出了多少个请求。而且,如果您想最终获得所有请求的响应,则在尝试获取任何url时还需要捕获类似的错误。