我的目标是能够从受密码保护的页面解析html/xml data
,然后根据我需要将xml commands
发送到另一台设备的数据(时间戳)。我尝试访问的页面是由IP设备生成的Web服务器。
另外,如果用其他语言更容易实现,请告诉我。
我很少有编程经验(一个C编程类)
我尝试过使用基本和摘要身份验证请求。我仍然无法通过身份验证,这使我无法继续进行身份验证。
以下是我的尝试:
import requests
from requests.auth import HTTPDigestAuth
url='http://myUsername:myPassword@example.com/cgi/metadata.cgi?template=html'
r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'))
r.status_code
print(r.headers)
print(r.status_code)
输出:
401
CaseInsensitiveDict({'Content-Length': '0', 'WWW-Authenticate': 'Digest realm="the realm of device", nonce="23cde09025c589f05f153b81306928c8", qop="auth"', 'Server': 'Device server name'})
我还尝试了BasicAuth
请求并获得相同的输出。我已尝试在网址中包含user:pass@
而不是。虽然当我把那个输入放到我的浏览器中时它可以工作。
我认为请求处理了Digest/BasicAuth
的标头数据,但也许我还需要包含标头?
我使用了Live HTTP Headers(firefox)并得到了这个:
GET /cgi/metadata.cgi?template=html
HTTP/1.1
Host: [Device IP]
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate DNT: 1 Connection: keep-alive
HTTP/1.1 401 Unauthorized WWW-Authenticate: Digest realm="Device Realm", nonce="a2333eec4cce86f78016343c48382d21",
qop="auth"
Server: Device Server Content-Length: 0
答案 0 :(得分:3)
这两个请求是独立的:
r = requests.get(url, auth=HTTPDigestAuth('user', 'pass'))
response = requests.get(url) #XXX <-- DROP IT
第二个请求不会发送任何凭据。因此,它收到401 Unauthorized
http响应状态就不足为奇了。
修复它:
url
相同的digest-auth/auth/user/pass
。最后删除r.status_code
。这只是请求文档response.status_code
而不是auth
,看看它是否成功。为什么要在网址和import logging
import requests
from requests.auth import HTTPDigestAuth
# these two lines enable debugging at httplib level (requests->urllib3->httplib)
# you will see the REQUEST, including HEADERS and DATA,
# and RESPONSE with HEADERS but without DATA.
# the only thing missing will be the response.body which is not logged.
try:
import httplib
except ImportError:
import http.client as httplib
httplib.HTTPConnection.debuglevel = 1
logging.basicConfig(level=logging.DEBUG) # you need to initialize logging,
# otherwise you will not see anything from requests
# make request
url = 'https://example.com/cgi/metadata.cgi?template=html'
r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'),
timeout=10)
print(r.status_code)
print(r.headers)
参数中使用用户名/密码?从网址中删除用户名/密码。要查看已发送的请求和响应标头,您可以enable logging/debugging:
{{1}}
答案 1 :(得分:2)
import requests
from requests.auth import HTTPDigestAuth
url='https://example.com/cgi/metadata.cgi?template=html'
r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'), verify=False, stream=True)
print(r.headers)
print(r.status_code)
修复了添加stream=True
,因为页面是流式传输xml / html数据。我的下一个问题是,如何存储/解析恒定的数据流?
我尝试在r.content中存储,但它似乎无限期地运行(我之前遇到的问题相同)