Question

在python 2.x脚本中，我正在寻找检查https页面是否返回特定内容的功能（可能需要解析页面内容才能发现这一点）。该页面还有一个htpasswd提示符，需要使用用户名和密码进行身份验证才能看到内容。所以我想我正在寻找一个模块或其他功能，使我能够硬编码用户名和密码，以便它可以获取页面，我可以操作输出（也就是检查是否存在代表404页面的关键字的等价物））。

我正在查看http://docs.python.org/2/library/httplib.html，但它似乎没有找到我要找的东西。

Answer 1

您可以使用httplib模块执行此操作，但有更简单的方法不需要手动驱动HTTP协议。

使用requests library（需要先安装的外部模块）可能最简单：

import requests

auth = ('someusername', 'somepassword')
response = requests.get(yoururl, auth=auth)
response.raise_for_status()

如果回复未成功或返回404未找到，则会引发异常。

然后，您可以使用response.content（字节字符串）或response.text（unicode响应）进一步解析响应正文。

使用标准库，使用urllib2 module看起来像：

import urllib2, base64

request = urllib2.Request(yoururl)
authstring = base64.encodestring('{}:{}'.format('someusername', 'somepassword')).strip()
request.add_header("Authorization", "Basic {}".format(authstring))   
response = urllib2.urlopen(request)

if not 200 <= response.getcode() < 400:
    # error response, raise an exception here?

content = response.read()
try:
    text = content.decode(response.info().getparam('charset', 'utf8'))
except UnicodeDecodeError:
    text = content.decode('ascii', 'replace')

其中content是响应正文的字节字符串内容，而text将是unicode值，直到某一点。

检查python中是否存在https网页

1 个答案: