Question

我收到“HTTP错误500：内部服务器错误”响应，但我仍然想读取错误HTML中的数据。

使用Python 2.6，我通常使用以下方式获取页面：

import urllib2
url = "http://google.com"
data = urllib2.urlopen(url)
data = data.read()

尝试在失败的网址上使用此功能时，我收到异常urllib2.HTTPError：

urllib2.HTTPError: HTTP Error 500: Internal Server Error

如何在返回内部服务器错误时获取此类错误页面（包含或不包含urllib2）？

请注意，对于Python 3，相应的例外是urllib.error.HTTPError。

Answer 1

HTTPError is a file-like object。您可以捕获它，然后read其内容。

try:
    resp = urllib2.urlopen(url)
    contents = resp.read()
except urllib2.HTTPError, error:
    contents = error.read()

Answer 2

如果你的意思是想要阅读500的主体：

request = urllib2.Request(url, data, headers)
try:
        resp = urllib2.urlopen(request)
        print resp.read()
except urllib2.HTTPError, error:
        print "ERROR: ", error.read()

在您的情况下，您不需要构建请求。只是做

try:
        resp = urllib2.urlopen(url)
        print resp.read()
except urllib2.HTTPError, error:
        print "ERROR: ", error.read()

所以，你不要覆盖urllib2.HTTPError，你只需要处理异常。

Answer 3

alist=['http://someurl.com']

def testUrl():
    errList=[]
    for URL in alist:
        try:
            urllib2.urlopen(URL)
        except urllib2.URLError, err:
            (err.reason != 200)
            errList.append(URL+" "+str(err.reason))
            return URL+" "+str(err.reason)
    return "".join(errList)

testUrl()

无论如何，覆盖urllib2.HTTPError或urllib.error.HTTPError并读取响应HTML

3 个答案: