使用mechanize输入用户名和密码

时间:2014-04-18 23:15:36

标签: python screen-scraping mechanize

如何使用mechanize在此站点上输入用户名和密码?

我删除并更改了帖子,因为我之前的帖子有太多额外信息

我在其他帖子中读过这可能与javascript有关,但我怎么说?我该怎么处理这些信息?

import mechanize
import cookielib

url = 'https://www.pin1.harvard.edu/cas/login?service=https%3A%2F%2Fwww.pin1.harvard.edu%2Fpin%2Fauthenticate%3F__authen_application%3DFAS_AC_AUTHENTICATOR'
#req = requests.get(url)
#dom = web.Element(req.text)

#Handles all the browser details 
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
#self.browser = mechanize.Browser(factory=mechanize.RobustFactory())

#Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

br.open(url)


#Select First Form
#br.select_form(nr=1)

#br['username'] = '40839852'

#print list(br.forms())[0] 

for form in br.forms():
    print "Form name:", form.name
    print form
    break

br.select_form(name= formname)
br[searchname] = term
res = br.submit()
content = res.read()
dom = web.Element(content)

TRACEBACK

ParseError: unexpected '/' char in declaration
---> 32 for form in br.forms():
     33     print "Form name:", form.name
     34     print form

更新 - 基于PACO的建议我加了......但我仍然得到追溯。 Python unable to retrieve form with urllib or mechanize

beg =  re.search(t, res.read()).span()[1]
res.set_data(res.get_data()[beg:])
br.set_response(response)
br.select_form(nr=0)


<ipython-input-25-bd1b73406b45> in <module>()
     28 br.set_response(response)
     29 
---> 30 br.select_form(nr=0)
     31 
     32 
ParseError: unexpected '-' char in declaration

1 个答案:

答案 0 :(得分:1)

这就是我在代码中选择第一个表单的方式。

br.select_form(nr=0)
#Form fields to populate
br.form['username'] = username
br.form['password'] = password
#Submit the login form
br.submit()

根据您的需要进行修改。 “nr = 0”可能就是你要找的东西。

但问题是DOCTYPE。我测试了以下内容,并将其剥离。

html = br.response().get_data().replace('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd >', '')
response = mechanize.make_response(
    html, [("Content-Type", "text/html")],
url, 200, "OK")
br.set_response(response)

我直接从Mechanize FAQ.

中取得了这个