机械化:如何提交表格

时间:2015-05-26 11:52:27

标签: python web-crawler mechanize

我无法通过提交表单进行登录。错误是关于: ValueError:未知的POST表单编码类型'application / x-www-form-encoded'

这是我的尝试:

import mechanize
import cookielib

browser = mechanize.Browser()

# Cookie Jar
cookiejar = cookielib.LWPCookieJar()
browser.set_cookiejar(cookiejar)

# Browser options
browser.set_handle_equiv(True)
#browser.set_handle_gzip(True)
browser.set_handle_redirect(True)
browser.set_handle_referer(True)
browser.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
browser.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# Want debugging messages?
browser.set_debug_http(True)
browser.set_debug_redirects(True)
browser.set_debug_responses(True)

# User-Agent (this is cheating, ok?)
browser.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 Firefox/29.0')]

response = browser.open("https://usms.upc.biz/arsys/shared/login.jsp")
# loginForm
browser.form = list(browser.forms())[0]
user_control = browser.form.find_control("username")
if user_control.type == "text":
    user_control.value = "SNIP!"
passwd_control = browser.form.find_control("pwd")
if passwd_control.type == "password":
    passwd_control.value = "SNIP!"

# browser.method = "POST"
response = browser.submit()
print response.read()

我得到的痕迹是:

send: 'GET /arsys/shared/login.jsp HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: usms.upc.biz\r\nConnection: close\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 Firefox/29.0\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: Apache-Coyote/1.1
header: Set-Cookie: JSESSIONID=AED35D71AE533F4D23350653C91DD3A2; Path=/arsys
header: Cache-Control: no-cache
header: Set-Cookie: q=""; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
header: Content-Type: text/html;charset=UTF-8
header: Date: Tue, 26 May 2015 11:44:38 GMT
header: Connection: close
header: Set-Cookie: BIGipServerPool_USMS_https_usms.upc.biz=1038228652.20480.0000; path=/
Traceback (most recent call last):
  File "Downloads/usms.py", line 39, in <module>
    response = browser.submit()
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 541, in submit
    return self.open(self.click(*args, **kwds))
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 530, in click
    request = self.form.click(*args, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_form.py", line 2999, in click
    self._request_class)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_form.py", line 3199, in _click
    return self._switch_click(return_type, request_class)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_form.py", line 3269, in _switch_click
    req_data = self._request_data()
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_form.py", line 3257, in _request_data
    "unknown POST form encoding type '%s'" % self.enctype)
ValueError: unknown POST form encoding type 'x-www-form-encoded'

以下是相关的页面来源:

<form name="loginForm" METHOD="post"
                            ACTION="/arsys/servlet/LoginServlet"
                                enctype="x-www-form-encoded">
                            <tbody>
                            <tr>
                                <td class="login" nowrap="nowrap" width="20px">&nbsp;</td>
                                <td class="login" colspan="2" nowrap="nowrap">
                                <em class="subhead">Please log in.</em>
                                </td>
                            </tr>
                            <tr>
                                <td class="login" nowrap="nowrap" width="20px" >&nbsp;</td>
                                <td class="login" nowrap="nowrap" id="LoginLabel-id">
                                    <b><label style="color:#FFFFFF;" for="username-id">User Name</label></b>
                                </td>
                                <td>
                                <input name="username" maxlength="254" id="username-id" value="" class="loginfield" size="30" type="text">
                                </td>
                            </tr>
                            <tr>
                                <td class="login" nowrap="nowrap" width="20px">&nbsp;</td>
                                <td class="login" id="PasswordLabel-id" nowrap="nowrap">
                                    <label style="color:#FFFFFF;" for="pwd-id">Password</label>
                                </td>
                                <td>
                                <input name="pwd" maxlength="61" id="pwd-id" class="loginfield" size="30" autocomplete="off" type="password">
                                </td>
                            </tr>
                            <tr>
                                <td class="Login" nowrap="nowrap" width="20px">&nbsp;</td>
                                <td class="Login" name="auth_label" nowrap="nowrap">
                                    <label style="color:#FFFFFF;" for="auth-id">Authentication</label>
                                </td>
                                <td><input type="text" NAME="auth" id="auth-id" maxlength="2048" class="loginfield" size="30"></td>
                            </tr>                           
                            <tr>
                                <td class="Login" nowrap="nowrap" width="20px">&nbsp;</td>
                                <td class="loginfield" nowrap="nowrap">&nbsp;</td>
                                <td>
                                    <input type="button" name="login" value="Log In" onClick="doLogin();"><!--;-->&nbsp;
                                    <input type="button" name="clear" value="Clear" onClick="clearLogin();"><!--;-->
                                </td>
                            </tr>
                            <tr>
                                <td class="Login" nowrap="nowrap">&nbsp;</td>
                                <td class="Login" nowrap="nowrap">&nbsp;</td>
                                <td>
                                    <input type="hidden" name="timezone" value="">
                                    <input type="hidden" name="encpwd" value="1">
                                    <input type="hidden" name="goto" value="" >
                                    <input type="hidden" name="server" value="" >
                                    <input type="hidden" name="ipoverride" value="0">
                                    <input type="hidden" name="initialState" value="-1">
                                    <input type="hidden" name="returnBack" value="">
                                </td>
                            </tr>

                    </tbody>
                    </form>

你能就此提出建议吗?

0 个答案:

没有答案