Question

我的目标是能够从受密码保护的页面解析html/xml data，然后根据我需要将xml commands发送到另一台设备的数据（时间戳）。我尝试访问的页面是由IP设备生成的Web服务器。另外，如果用其他语言更容易实现，请告诉我。我很少有编程经验（一个C编程类）

我尝试过使用基本和摘要身份验证请求。我仍然无法通过身份验证，这使我无法继续进行身份验证。

以下是我的尝试：

import requests
from requests.auth import HTTPDigestAuth

url='http://myUsername:myPassword@example.com/cgi/metadata.cgi?template=html'
r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'))        
r.status_code

print(r.headers) 
print(r.status_code)

输出：

401 
CaseInsensitiveDict({'Content-Length': '0', 'WWW-Authenticate': 'Digest realm="the realm of device", nonce="23cde09025c589f05f153b81306928c8", qop="auth"', 'Server': 'Device server name'})

我还尝试了BasicAuth请求并获得相同的输出。我已尝试在网址中包含user:pass@而不是。虽然当我把那个输入放到我的浏览器中时它可以工作。

我认为请求处理了Digest/BasicAuth的标头数据，但也许我还需要包含标头？

我使用了Live HTTP Headers（firefox）并得到了这个：

GET /cgi/metadata.cgi?template=html
HTTP/1.1 
Host: [Device IP] 
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Accept-Language: en-US,en;q=0.5 
Accept-Encoding: gzip, deflate DNT: 1 Connection: keep-alive
HTTP/1.1 401 Unauthorized WWW-Authenticate: Digest realm="Device Realm", nonce="a2333eec4cce86f78016343c48382d21", 
qop="auth" 
Server: Device Server Content-Length: 0

Answer 1

这两个请求是独立的：

r = requests.get(url, auth=HTTPDigestAuth('user', 'pass')) 
response = requests.get(url) #XXX <-- DROP IT

第二个请求不会发送任何凭据。因此，它收到401 Unauthorized http响应状态就不足为奇了。

修复它：

使用与浏览器中使用的url相同的digest-auth/auth/user/pass。最后删除r.status_code。这只是请求文档
打印response.status_code而不是auth，看看它是否成功。

为什么要在网址和import logging import requests from requests.auth import HTTPDigestAuth # these two lines enable debugging at httplib level (requests->urllib3->httplib) # you will see the REQUEST, including HEADERS and DATA, # and RESPONSE with HEADERS but without DATA. # the only thing missing will be the response.body which is not logged. try: import httplib except ImportError: import http.client as httplib httplib.HTTPConnection.debuglevel = 1 logging.basicConfig(level=logging.DEBUG) # you need to initialize logging, # otherwise you will not see anything from requests # make request url = 'https://example.com/cgi/metadata.cgi?template=html' r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'), timeout=10) print(r.status_code) print(r.headers)参数中使用用户名/密码？从网址中删除用户名/密码。要查看已发送的请求和响应标头，您可以enable logging/debugging：

{{1}}

Answer 2

import requests
from requests.auth import HTTPDigestAuth

url='https://example.com/cgi/metadata.cgi?template=html'
r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'), verify=False,  stream=True)        


print(r.headers) 
print(r.status_code)

修复了添加stream=True，因为页面是流式传输xml / html数据。我的下一个问题是，如何存储/解析恒定的数据流？

我尝试在r.content中存储，但它似乎无限期地运行（我之前遇到的问题相同）

使用Python请求模块的HTTP摘要/基本身份验证

2 个答案: