如何使用pycurl通过post表单进行身份验证

时间:2016-04-21 08:41:15

标签: python selenium cookies urllib pycurl

我目前使用selenium登录网页并获取我需要访问该网站的cookie,然后我用它来验证一堆JSON RPC请求(也使用pycurl)。以下代码(以及后来的pycurl JSON RPC请求)完美地运行:

driver = webdriver.PhantomJS()
driver.get(my_url)
driver.find_element_by_name('u').send_keys(username)
driver.find_element_by_name('p').send_keys(password)
button = driver.find_element_by_tag_name('button')
button.click()
driver.get_cookies()[0]

但是,我试图删除外部依赖项,特别是在webdriver(在我的情况下是PhantomJS)上,并使用pycurl进行作业。我尝试过以下方法:

from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, my_url)
c.setopt(c.TIMEOUT, 5)
c.setopt(c.HEADER, 1)
c.setopt(c.FOLLOWLOCATION, True)  # Follow redirects
c.setopt(c.AUTOREFERER, True)
c.setopt(c.POSTREDIR, pycurl.REDIR_POST_ALL)  # Follow redirects after post ...
c.setopt(c.POSTFIELDS, 'u='+ username + '&p=' + password + '&submit=Login')
c.setopt(c.COOKIEJAR, 'xcomfort.cookie')
c.setopt(c.VERBOSE, True)
c.setopt(c.WRITEFUNCTION, buffer.write)

c.perform()
c.close()

然而,pycurl的详细输出是:

*   Trying 83.201.233.76...
* Connected to somehost.dyndns.org (83.201.233.76) port 8080 (#0)
> POST /bcgui/index.html HTTP/1.1 Host: somehost.dyndns.org:8080 User-Agent: PycURL/7.43.0 libcurl/7.47.0 OpenSSL/1.0.2e zlib/1.2.8 c-ares/1.10.0 libssh2/1.6.0 Accept: */* Content-Length: 36 Content-Type: application/x-www-form-urlencoded

* upload completely sent off: 36 out of 36 bytes < HTTP/1.1 401 Unauthorized < content-type: text/html; charset=UTF-8 < transfer-encoding: chunked < cache-control: no-cache, no-store, must-revalidate, max-age=0 < date: Thu, 21 Apr 2016 08:19:49 GMT < pragma: no-cache < www-authenticate: None
* Added cookie JSESSIONID="ID3407DB686398770End" for domain somehost.dyndns.org, path /, expire 0 < set-cookie: JSESSIONID=ID3407DB686398770End; Path=/; HttpOnly < 
* Connection #0 to host somehost.dyndns.org left intact

如您所见,我在这里收到401错误。

我尝试登录的页面具有以下登录形式:

<form method="post" action="/system/http/login">
    <div id="login_dialog" class="ui-dialog ui-widget ui-widget-content ui-corner-all">

        <table>
            <tr>
                <td><span class="ui-widget-header-1 ui-helper-clearfix ui-dialog-title">Smart Home Controller</span></td>
                <td><img src="/system/http/img/eaton_logo.jpg" /></td>
            </tr>
        </table>

        <div class="ui-dialog-titlebar ui-widget-header ui-corner-all ui-helper-clearfix" >
            <span class="ui-dialog-title">Please login</span>
        </div>

        <div id="editor" class="ui-dialog-content" >
            <div id="error_message" class="ui-state-error ui-helper-hidden"></div>
            <div id="remaining_time" class="ui-state-error ui-helper-hidden">User is locked out for <span>0</span>.</div>
            <table>
                <tr>
                    <td class="r">Username:</td>
                    <td><input name="u"/></td>
                </tr>
                <tr>
                    <td class="r">Password:</td>
                    <td><input type="password" autocomplete="off" name="p"/></td>
                </tr>
                <tr>
                    <td class="r">&nbsp;</td>
                    <td>
                        <input type="checkbox" name="r"/> Remember me
                        <input type="hidden" name="referer" value="/" />
                    </td>
                </tr>
            </table>
        </div>

        <div class="ui-dialog-buttonpane ui-widget-content ui-helper-clearfix">
            <button type="submit">Login</button>
        </div>
    </div>
    </form>

我完全被这里难住了。 Selenium很漂亮,但pycurl一直给我401.因为我期待有人会告诉我使用请求,我也做了:

import requests
headers = {'User-Agent': 'Mozilla/5.0'}
data = {'u': username, 'p': password }

session = requests.Session()
session.get(my_url)

response = session.post(my_url, json=data, headers=headers, )
print (response)
print(requests.utils.dict_from_cookiejar(session.cookies))

但是,这会产生:

<Response [401]>
{'JSESSIONID': 'ID3410DB1909202780End'}

这基本上是同一个问题(cookie包含会话ID,但它未经过身份验证,不能用于以后的请求)。

关于我可能在哪里出错的任何指示?我更喜欢pycurl方法,因为我在其余代码中使用了JSON RPC,但我当然对任何想法持开放态度。

更新: 奇怪的是,似乎如果我使用标准身份验证并忽略页面上的表单,它就可以工作。我不知道为什么,因为我没有从浏览器获得登录提示。只有一个网页填写用户名/密码。它仍然有效。以下代码为我提供了一个授权的会话cookie:

headers = {'User-Agent': 'Mozilla/5.0'}

session = requests.Session()
session.get(url)

response = session.post(url, headers=headers, auth=(username, password))

session_id = requests.utils.dict_from_cookiejar(session.cookies)['JSESSIONID']
return session_id

0 个答案:

没有答案