我使用python想要获取需要身份验证的网页的原始HTML。
与this问题类似,但此处的答案不起作用。
我正在尝试的代码:
import urllib, urllib2, cookielib
username = 'redacted'
password = 'redacted'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open('https://redacted.net', login_data)#http://www.example.com/login.php
resp = opener.open('https://redacted.net')#http://www.example.com/hiddenpage.php
print resp.read() #print strait HTML of the page can use opener to view any page using your session cookie.
错误:
Traceback (most recent call last):
File "C:/Users/Jacob/Desktop/School/Python_Scripts/session refresher/session_refresher.py", line 9, in <module>
opener.open('Redacted', login_data)#http://www.example.com/login.php
File "C:\Python27\lib\urllib2.py", line 437, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 475, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 401: Unauthorized
以下是当我使用浏览器访问网页时弹出要求身份验证的窗口。
答案 0 :(得分:3)
我之前使用requests
因为它比urllib提供的身份验证更简单。
import requests
r = requests.get("https://redacted.net", auth=('username', 'password'))
print(r.text)
答案 1 :(得分:1)
使用requests
并在请求中提供您的用户/传递对:
import requests
requests.get('https://redacted.net', auth=('user', 'pass'))