Python URL重定向 - 移动到404的句柄302

时间:2014-01-13 12:20:43

标签: python urllib2

我知道我们可以使用requestsurllib2包处理重定向。我不想使用requests,因为它不是预先构建的包。请帮我urllib2处理一个带有302然后移到404的URL。我不关心404,我想跟踪它是301还是302.

我提到了这个doc,但它仍然会抛出404。

这是我的代码

import urllib2
class My_HTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

my_opener = urllib2.build_opener(My_HTTPRedirectHandler)
urllib2.install_opener(my_opener)
response =urllib2.urlopen("MY URL")
print response.read()

以下是我的回复

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 438, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "<stdin>", line 3, in http_error_302
  File "/usr/lib/python2.7/urllib2.py", line 625, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found    

1 个答案:

答案 0 :(得分:0)

最好使用httplib和HTTP / 1.1 HEAD方法。这样就没有收到回复机构。

What’s the best way to get an HTTP response code from a URL?