Question

如here所示，可以为max-retries设置requests.Session()，但是我只需要head.status_code来检查网址是否有效和有效。

有没有办法让挂载会话中的头正常？

import requests
def valid_active_url(url):
    try:
        site_ping = requests.head(url, allow_redirects=True)
    except requests.exceptions.ConnectionError:
        print('Error trying to connect to {}.'.format(url))

    try:
        if (site_ping.status_code < 400):
            return True
        else:
            return False
    except Exception:
        return False
    return False

基于docs，我认为我需要：

查看session.mount方法的结果是否返回状态码（我尚未找到）
滚动我自己的重试方法，也许使用诸如this或this这样的修饰符，或者像this这样的（不太雄辩的）循环。

关于我尝试过的第一种方法：

s = requests.Session()
a = requests.adapters.HTTPAdapter(max_retries=3)
s.mount('http://redirected-domain.com', a)
resp = s.get('http://www.redirected-domain.org')
resp.status_code

我们是否仅使用s.mount()进入并设置max_retries？除了可以预先建立http连接之外，这似乎是一种冗余。

resp.status_code还会在我期望200的地方返回301（requests.head会返回的结果。

注意：resp.ok可能就是我在这里需要的全部内容。

Answer 1

仅用了两个小时就提出了问题，答案花了五分钟：

def valid_url(url):
    if (url.lower() == 'none') or (url == ''):
        return False
    try:
        s = requests.Session()
        a = requests.adapters.HTTPAdapter(max_retries=5)
        s.mount(url, a)
        resp = s.head(url)
        return resp.ok
    except requests.exceptions.MissingSchema:
        # If it's missing the schema, run again with schema added
        return valid_url('http://' + url)
    except requests.exceptions.ConnectionError:
        print('Error trying to connect to {}.'.format(url))
        return False

基于this answer，看来head请求比get的资源占用要少一些，特别是在url包含大量数据的情况下。

requests.adapters.HTTPAdapter是urllib3库的内置适配器，该库是Requests库的基础。

另一方面，我不确定此处要检查的正确术语或短语是什么。如果该网址返回错误代码，则该网址仍可能是有效。

通过请求的max_retries设置获取status_code

1 个答案: