Question

我正在编写脚本来检查大量的URL并返回每个URL的HTTP状态代码。我尝试了我能想到的一切，或者在网上寻找异常处理。该脚本运行一段时间，然后最终崩溃并出现错误：

requests.exceptions.ConnectionError: HTTPConnectionPool(host='10.10.10.10', port=80): Max retries exceeded with url: /wmedia (Caused by NewConnectionError("<urllib3.connection.HTTPConnection object at 0x1029bfe10>: Failed to establish a new connection: [Errno 49] Can't assign requested address",))

我认为服务器在一段时间后会被过多的请求所淹没，而且睡眠时间也没有帮助。

这是我使用进程池的工作函数：

def get(url):
r = requests.get(url, timeout=2)

try:
    r.raise_for_status()
except requests.exceptions.HTTPError as err:
    print(err)
    pass
except requests.ConnectionError as e:
    print("OOPS!! Connection Error")
    r.status_code = "Connection refused"
    time.sleep(2)
    print(str(e))
except requests.Timeout as e:
    print("OOPS!! Timeout Error")
    r.status_code = "Timed out"
    time.sleep(2)
    print(str(e))
except requests.RequestException as e:
    print("OOPS!! General Error")
    r.status_code = "Error"
    print(str(e))
except KeyboardInterrupt:
    print("Someone closed the program")
    r.status_code = "Interrupted"
except Exception as e:
    print(e)
    r.status_code = "Error"

return param, r.status_code

有什么建议吗？

Answer 1

您可以使用sample_weights获取HTTP状态代码。 This site将所有可能的HTTP状态代码用逗号分隔（我在下面的示例中将其用作' httpStatusCodes.txt '）。

urllib

因此，我们读取了dict中的所有状态代码，并在代码不可用时进行了规定。

import urllib
from collections import defaultdict

adict = {}
with open("httpStatusCodes.txt") as f:
    for line in f:
        line = line.rstrip()
        (key,val) = line.split(',')
        adict[int(key)] = val

然后我们遍历网站列表并获取其状态代码。

adict = defaultdict(lambda: "'Code not defined'", adict)

请注意，我故意列出websites = ['facebook.com', 'twitter.com', 'google.com', 'youtube.com', 'icantfindthiswebsite.com'] for url in websites: try: code = urllib.urlopen('http://' +url).getcode() except IOError: code = None print "url = {}, code = {}, status = {}".format(url, code, adict[code])来模拟无法访问的网站。此异常由icantfindthiswebsite.com处理。

结果

IOError:

Python requests.exceptions.ConnectionError：HTTPConnectionPool（host =＆＃39; 10.10.10.10＆＃39;，port = 80）：url超出最大重试次数

1 个答案: