Question

尝试使用请求下载网址列表并捕获异常（如果它是一个错误的网址）。这是我的测试代码：

import requests
from requests.exceptions import ConnectionError

#goodurl
url = "http://www.google.com"

#badurl with good host
#url = "http://www.google.com/thereisnothing.jpg"

#url with bad host
#url = "http://somethingpotato.com"    

print url
try:
    r = requests.get(url, allow_redirects=True)
    print "the url is good"
except ConnectionError,e:
    print e
    print "the url is bad"

问题是如果我传入url =“http://www.google.com”一切正常，因为它是一个很好的网址。

http://www.google.com
the url is good

但如果我传入url =“http://www.google.com/thereisnothing.jpg”

我仍然得到：

http://www.google.com/thereisnothing.jpg
the url is good

所以它几乎就像在“/”

之后甚至没有看任何东西

只是为了查看错误检查是否正常工作我传递了一个错误的主机名：#url =“http://somethingpotato.com”

它踢回了我预期的错误消息：

http://somethingpotato.com
HTTPConnectionPool(host='somethingpotato.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1b6cd15b90>: Failed to establish a new connection: [Errno -2] Name or service not known',))
the url is bad

我缺少什么使请求捕获坏网址而不仅仅是错误的主机名？

由于

Answer 1

请求不会在404响应中创建可抛出异常。相反，你需要过滤掉它们，检查状态是否正常。（HTTP响应200）

import requests
from requests.exceptions import ConnectionError

#goodurl
url = "http://www.google.com/nothing"

#badurl with good host
#url = "http://www.google.com/thereisnothing.jpg"

#url with bad host
#url = "http://somethingpotato.com"    

print url
try:
    r = requests.get(url, allow_redirects=True)
    if r.status_code == requests.codes.ok:
        print "the url is good"
    else:
        print "the url is bad"
except ConnectionError,e:
    print e
    print "the url is bad"

编辑：导入请求 from requests.exceptions import ConnectionError

def printFailedUrl(url, response):
    if isinstance(response, ConnectionError):
        print "The url " + url + " failed to connect with the exception " + str(response)
    else:
        print "The url " + url + " produced the failed response code " + str(response.status_code)

def testUrl(url):
    try:
        r = requests.get(url, allow_redirects=True)
        if r.status_code == requests.codes.ok:
            print "the url is good"
        else:
            printFailedUrl(url, r)
    except ConnectionError,e:
        printFailedUrl(url, e)

def main():
    testUrl("http://www.google.com") #'Good' Url 
    testUrl("http://www.google.com/doesnotexist.jpg") #'Bad' Url with 404 response
    testUrl("http://sdjgb") #'Bad' url with inaccessable url

main()

在这种情况下，一个函数可以处理异常或传递给它的请求响应。通过这种方式，如果网址返回了一些非好的＆＃39; （非200）响应vs一个抛出异常的不可用url。希望这有你需要的信息。

Answer 2

你想要的是检查r.status_code。获取r.status_code“http://www.google.com/thereisnothing.jpg”将为您提供404.您可以将仅200个代码网址的条件设置为“良好”。

python使用具有有效主机名的请求

2 个答案: