Python - 在python 3中获取头部请求的最有效方法

时间:2017-09-29 07:50:29

标签: python python-3.x http http-headers python-requests

我发现这个代码对我来说似乎可靠而有效但不幸的是它对于python2而且它使用了urllib2,而每个人都说请求更快。在python 3中,下面的等价代码(或更有效或更可靠的东西)是什么?

#!/usr/bin/env python
#-*- coding:utf-8 -*-

import sys
import urllib2

# This script uses HEAD requests (with fallback in case of 405)
# to follow the redirect path up to the real URL
# (c) 2012 Filippo Valsorda - FiloSottile
# Released under the GPL license

class HeadRequest(urllib2.Request):
    def get_method(self):
        return "HEAD"

class HEADRedirectHandler(urllib2.HTTPRedirectHandler):
    """
    Subclass the HTTPRedirectHandler to make it use our
    HeadRequest also on the redirected URL
    """
    def redirect_request(self, req, fp, code, msg, headers, newurl):
        if code in (301, 302, 303, 307):
            newurl = newurl.replace(' ', '%20')
            newheaders = dict((k,v) for k,v in req.headers.items()
                              if k.lower() not in ("content-length", "content-type"))
            return HeadRequest(newurl,
                               headers=newheaders,
                               origin_req_host=req.get_origin_req_host(),
                               unverifiable=True)
        else:
            raise urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)

class HTTPMethodFallback(urllib2.BaseHandler):
    """
    Fallback to GET if HEAD is not allowed (405 HTTP error)
    """
    def http_error_405(self, req, fp, code, msg, headers):
        fp.read()
        fp.close()

        newheaders = dict((k,v) for k,v in req.headers.items()
                          if k.lower() not in ("content-length", "content-type"))
        return self.parent.open(urllib2.Request(req.get_full_url(),
                                         headers=newheaders,
                                         origin_req_host=req.get_origin_req_host(),
                                         unverifiable=True))

# Build our opener
opener = urllib2.OpenerDirector()
for handler in [urllib2.HTTPHandler, urllib2.HTTPDefaultErrorHandler,
                HTTPMethodFallback, HEADRedirectHandler,
                urllib2.HTTPErrorProcessor, urllib2.HTTPSHandler]:
    opener.add_handler(handler())

response = opener.open(HeadRequest(sys.argv[1]))

print(response.geturl())

顺便说一下头部请求实际上并不是我需要的。我只想知道链接是否被破坏(在某些网站中,如果你给他们一个破损的代码,他们会将你重定向回网站的主页面,我希望我的代码也能识别这一点)并且头部请求是最有效的我想到的解决方案,如果你知道更好的方式,我也会很感激。

1 个答案:

答案 0 :(得分:1)

查看请求:http://docs.python-requests.org/en/master/

要执行请求,您只需:

import requests

r=requests.head('http://www.example.com')

然后您可以根据需要访问该对象。例如,状态代码:

print r.status_code

<强>更新: 如果您想要查看某个网页是否有效,则您需要执行 GET 请求。我已经看到 HEAD 请求返回200响应的情况,并且在同一网址上, GET 请求返回500