python使用具有有效主机名的请求

时间:2017-07-06 00:47:00

标签: python python-2.7 download python-requests

尝试使用请求下载网址列表并捕获异常(如果它是一个错误的网址)。这是我的测试代码:

import requests
from requests.exceptions import ConnectionError

#goodurl
url = "http://www.google.com"

#badurl with good host
#url = "http://www.google.com/thereisnothing.jpg"

#url with bad host
#url = "http://somethingpotato.com"    

print url
try:
    r = requests.get(url, allow_redirects=True)
    print "the url is good"
except ConnectionError,e:
    print e
    print "the url is bad"

问题是如果我传入url =“http://www.google.com”一切正常,因为它是一个很好的网址。

http://www.google.com
the url is good

但如果我传入url =“http://www.google.com/thereisnothing.jpg

我仍然得到:

http://www.google.com/thereisnothing.jpg
the url is good

所以它几乎就像在“/”

之后甚至没有看任何东西

只是为了查看错误检查是否正常工作我传递了一个错误的主机名:#url =“http://somethingpotato.com

它踢回了我预期的错误消息:

http://somethingpotato.com
HTTPConnectionPool(host='somethingpotato.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1b6cd15b90>: Failed to establish a new connection: [Errno -2] Name or service not known',))
the url is bad

我缺少什么使请求捕获坏网址而不仅仅是错误的主机名?

由于

2 个答案:

答案 0 :(得分:2)

请求不会在404响应中创建可抛出异常。相反,你需要过滤掉它们,检查状态是否正常。 (HTTP响应200)

import requests
from requests.exceptions import ConnectionError

#goodurl
url = "http://www.google.com/nothing"

#badurl with good host
#url = "http://www.google.com/thereisnothing.jpg"

#url with bad host
#url = "http://somethingpotato.com"    

print url
try:
    r = requests.get(url, allow_redirects=True)
    if r.status_code == requests.codes.ok:
        print "the url is good"
    else:
        print "the url is bad"
except ConnectionError,e:
    print e
    print "the url is bad"

编辑: 导入请求     from requests.exceptions import ConnectionError

def printFailedUrl(url, response):
    if isinstance(response, ConnectionError):
        print "The url " + url + " failed to connect with the exception " + str(response)
    else:
        print "The url " + url + " produced the failed response code " + str(response.status_code)

def testUrl(url):
    try:
        r = requests.get(url, allow_redirects=True)
        if r.status_code == requests.codes.ok:
            print "the url is good"
        else:
            printFailedUrl(url, r)
    except ConnectionError,e:
        printFailedUrl(url, e)

def main():
    testUrl("http://www.google.com") #'Good' Url 
    testUrl("http://www.google.com/doesnotexist.jpg") #'Bad' Url with 404 response
    testUrl("http://sdjgb") #'Bad' url with inaccessable url

main()

在这种情况下,一个函数可以处理异常或传递给它的请求响应。通过这种方式,如果网址返回了一些非好的&#39; (非200)响应vs一个抛出异常的不可用url。希望这有你需要的信息。

答案 1 :(得分:0)

你想要的是检查r.status_code。获取r.status_codehttp://www.google.com/thereisnothing.jpg”将为您提供404.您可以将仅200个代码网址的条件设置为“良好”。