处理urllib2.URLError时获取URL

时间:2011-06-28 15:53:19

标签: python exception-handling urllib2

这特别适用于urllib2,但更常见的是自定义异常处理。如何通过引发的异常将其他信息传递给另一个模块中的调用函数?我假设我会使用自定义异常类重新加注,但我不确定技术细节。

不是用我尝试过的和失败的方式污染示例代码,而是将其简单地呈现为一个空白的板块。我的最终目标是让样本中的最后一行起作用。

#mymod.py
import urllib2

def openurl():
    req = urllib2.Request("http://duznotexist.com/")
    response = urllib2.urlopen(req)

#main.py
import urllib2
import mymod

try:
    mymod.openurl()
except urllib2.URLError as e:
    #how do I do this?
    print "Website (%s) could not be reached due to %s" % (e.url, e.reason)

2 个答案:

答案 0 :(得分:8)

您可以添加信息,然后重新引发异常。

#mymod.py
import urllib2

def openurl():
    req = urllib2.Request("http://duznotexist.com/")
    try:
        response = urllib2.urlopen(req)
    except urllib2.URLError as e:
        # add URL and reason to the exception object
        e.url = "http://duznotexist.com/"
        e.reason = "URL does not exist"
        raise e # re-raise the exception, so the calling function can catch it

#main.py
import urllib2
import mymod

try:
    mymod.openurl()
except urllib2.URLError as e:
    print "Website (%s) could not be reached due to %s" % (e.url, e.reason)

答案 1 :(得分:0)

我不认为重新提出异常是解决此问题的适当方法。

正如@Jonathan Vanasco所说,

  

如果您正在打开a.com,并且301重定向到b.com,则urlopen将自动跟随,因为引发了带有重定向的HTTPError。如果b.com导致URLError,则上面的代码将a.com标记为不存在

我的解决方案是覆盖redirect_request

urllib2.HTTPRedirectHandler
import urllib2

class NewHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, headers, newurl):
        m = req.get_method()
        if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
            or code in (301, 302, 303) and m == "POST"):
            newurl = newurl.replace(' ', '%20')
            newheaders = dict((k,v) for k,v in req.headers.items()
                              if k.lower() not in ("content-length", "content-type")
                             )
            # reuse the req object
            # mind that req will be changed if redirection happends
            req.__init__(newurl,
                headers=newheaders,
                   origin_req_host=req.get_origin_req_host(),
                   unverifiable=True)
            return req
        else:
            raise HTTPError(req.get_full_url(), code, msg, headers, fp)

opener = urllib2.build_opener(NewHTTPRedirectHandler)
urllib2.install_opener(opener)
# mind that req will be changed if redirection happends
#req = urllib2.Request('http://127.0.0.1:5000')
req = urllib2.Request('http://www.google.com/')

try:
    response = urllib2.urlopen(req)
except urllib2.URLError as e:
    print 'error'
    print req.get_full_url()
else:
    print 'normal'
    print response.geturl()

让我们尝试将网址重定向到未知网址:

import os
from flask import Flask,redirect

app = Flask(__name__)

@app.route('/')
def hello():
    # return 'hello world'
    return redirect("http://a.com", code=302)

    if __name__ == '__main__':
    port = int(os.environ.get('PORT', 5000))
    app.run(host='0.0.0.0', port=port)

结果是:

error
http://a.com/

normal
http://www.google.com/