尝试使用请求下载网址列表并捕获异常(如果它是一个错误的网址)。这是我的测试代码:
import requests
from requests.exceptions import ConnectionError
#goodurl
url = "http://www.google.com"
#badurl with good host
#url = "http://www.google.com/thereisnothing.jpg"
#url with bad host
#url = "http://somethingpotato.com"
print url
try:
r = requests.get(url, allow_redirects=True)
print "the url is good"
except ConnectionError,e:
print e
print "the url is bad"
问题是如果我传入url =“http://www.google.com”一切正常,因为它是一个很好的网址。
http://www.google.com
the url is good
但如果我传入url =“http://www.google.com/thereisnothing.jpg”
我仍然得到:
http://www.google.com/thereisnothing.jpg
the url is good
所以它几乎就像在“/”
之后甚至没有看任何东西只是为了查看错误检查是否正常工作我传递了一个错误的主机名:#url =“http://somethingpotato.com”
它踢回了我预期的错误消息:
http://somethingpotato.com
HTTPConnectionPool(host='somethingpotato.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1b6cd15b90>: Failed to establish a new connection: [Errno -2] Name or service not known',))
the url is bad
我缺少什么使请求捕获坏网址而不仅仅是错误的主机名?
由于
答案 0 :(得分:2)
请求不会在404响应中创建可抛出异常。相反,你需要过滤掉它们,检查状态是否正常。 (HTTP响应200)
import requests
from requests.exceptions import ConnectionError
#goodurl
url = "http://www.google.com/nothing"
#badurl with good host
#url = "http://www.google.com/thereisnothing.jpg"
#url with bad host
#url = "http://somethingpotato.com"
print url
try:
r = requests.get(url, allow_redirects=True)
if r.status_code == requests.codes.ok:
print "the url is good"
else:
print "the url is bad"
except ConnectionError,e:
print e
print "the url is bad"
编辑: 导入请求 from requests.exceptions import ConnectionError
def printFailedUrl(url, response):
if isinstance(response, ConnectionError):
print "The url " + url + " failed to connect with the exception " + str(response)
else:
print "The url " + url + " produced the failed response code " + str(response.status_code)
def testUrl(url):
try:
r = requests.get(url, allow_redirects=True)
if r.status_code == requests.codes.ok:
print "the url is good"
else:
printFailedUrl(url, r)
except ConnectionError,e:
printFailedUrl(url, e)
def main():
testUrl("http://www.google.com") #'Good' Url
testUrl("http://www.google.com/doesnotexist.jpg") #'Bad' Url with 404 response
testUrl("http://sdjgb") #'Bad' url with inaccessable url
main()
在这种情况下,一个函数可以处理异常或传递给它的请求响应。通过这种方式,如果网址返回了一些非好的&#39; (非200)响应vs一个抛出异常的不可用url。希望这有你需要的信息。
答案 1 :(得分:0)
你想要的是检查r.status_code
。获取r.status_code
“http://www.google.com/thereisnothing.jpg”将为您提供404.您可以将仅200个代码网址的条件设置为“良好”。