我目前使用selenium登录网页并获取我需要访问该网站的cookie,然后我用它来验证一堆JSON RPC请求(也使用pycurl)。以下代码(以及后来的pycurl JSON RPC请求)完美地运行:
driver = webdriver.PhantomJS()
driver.get(my_url)
driver.find_element_by_name('u').send_keys(username)
driver.find_element_by_name('p').send_keys(password)
button = driver.find_element_by_tag_name('button')
button.click()
driver.get_cookies()[0]
但是,我试图删除外部依赖项,特别是在webdriver(在我的情况下是PhantomJS)上,并使用pycurl进行作业。我尝试过以下方法:
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, my_url)
c.setopt(c.TIMEOUT, 5)
c.setopt(c.HEADER, 1)
c.setopt(c.FOLLOWLOCATION, True) # Follow redirects
c.setopt(c.AUTOREFERER, True)
c.setopt(c.POSTREDIR, pycurl.REDIR_POST_ALL) # Follow redirects after post ...
c.setopt(c.POSTFIELDS, 'u='+ username + '&p=' + password + '&submit=Login')
c.setopt(c.COOKIEJAR, 'xcomfort.cookie')
c.setopt(c.VERBOSE, True)
c.setopt(c.WRITEFUNCTION, buffer.write)
c.perform()
c.close()
然而,pycurl的详细输出是:
* Trying 83.201.233.76...
* Connected to somehost.dyndns.org (83.201.233.76) port 8080 (#0)
> POST /bcgui/index.html HTTP/1.1 Host: somehost.dyndns.org:8080 User-Agent: PycURL/7.43.0 libcurl/7.47.0 OpenSSL/1.0.2e zlib/1.2.8 c-ares/1.10.0 libssh2/1.6.0 Accept: */* Content-Length: 36 Content-Type: application/x-www-form-urlencoded
* upload completely sent off: 36 out of 36 bytes < HTTP/1.1 401 Unauthorized < content-type: text/html; charset=UTF-8 < transfer-encoding: chunked < cache-control: no-cache, no-store, must-revalidate, max-age=0 < date: Thu, 21 Apr 2016 08:19:49 GMT < pragma: no-cache < www-authenticate: None
* Added cookie JSESSIONID="ID3407DB686398770End" for domain somehost.dyndns.org, path /, expire 0 < set-cookie: JSESSIONID=ID3407DB686398770End; Path=/; HttpOnly <
* Connection #0 to host somehost.dyndns.org left intact
如您所见,我在这里收到401错误。
我尝试登录的页面具有以下登录形式:
<form method="post" action="/system/http/login">
<div id="login_dialog" class="ui-dialog ui-widget ui-widget-content ui-corner-all">
<table>
<tr>
<td><span class="ui-widget-header-1 ui-helper-clearfix ui-dialog-title">Smart Home Controller</span></td>
<td><img src="/system/http/img/eaton_logo.jpg" /></td>
</tr>
</table>
<div class="ui-dialog-titlebar ui-widget-header ui-corner-all ui-helper-clearfix" >
<span class="ui-dialog-title">Please login</span>
</div>
<div id="editor" class="ui-dialog-content" >
<div id="error_message" class="ui-state-error ui-helper-hidden"></div>
<div id="remaining_time" class="ui-state-error ui-helper-hidden">User is locked out for <span>0</span>.</div>
<table>
<tr>
<td class="r">Username:</td>
<td><input name="u"/></td>
</tr>
<tr>
<td class="r">Password:</td>
<td><input type="password" autocomplete="off" name="p"/></td>
</tr>
<tr>
<td class="r"> </td>
<td>
<input type="checkbox" name="r"/> Remember me
<input type="hidden" name="referer" value="/" />
</td>
</tr>
</table>
</div>
<div class="ui-dialog-buttonpane ui-widget-content ui-helper-clearfix">
<button type="submit">Login</button>
</div>
</div>
</form>
我完全被这里难住了。 Selenium很漂亮,但pycurl一直给我401.因为我期待有人会告诉我使用请求,我也做了:
import requests
headers = {'User-Agent': 'Mozilla/5.0'}
data = {'u': username, 'p': password }
session = requests.Session()
session.get(my_url)
response = session.post(my_url, json=data, headers=headers, )
print (response)
print(requests.utils.dict_from_cookiejar(session.cookies))
但是,这会产生:
<Response [401]>
{'JSESSIONID': 'ID3410DB1909202780End'}
这基本上是同一个问题(cookie包含会话ID,但它未经过身份验证,不能用于以后的请求)。
关于我可能在哪里出错的任何指示?我更喜欢pycurl方法,因为我在其余代码中使用了JSON RPC,但我当然对任何想法持开放态度。
更新: 奇怪的是,似乎如果我使用标准身份验证并忽略页面上的表单,它就可以工作。我不知道为什么,因为我没有从浏览器获得登录提示。只有一个网页填写用户名/密码。它仍然有效。以下代码为我提供了一个授权的会话cookie:
headers = {'User-Agent': 'Mozilla/5.0'}
session = requests.Session()
session.get(url)
response = session.post(url, headers=headers, auth=(username, password))
session_id = requests.utils.dict_from_cookiejar(session.cookies)['JSESSIONID']
return session_id