如何使用mechanize在此站点上输入用户名和密码?
我删除并更改了帖子,因为我之前的帖子有太多额外信息
我在其他帖子中读过这可能与javascript有关,但我怎么说?我该怎么处理这些信息?
import mechanize
import cookielib
url = 'https://www.pin1.harvard.edu/cas/login?service=https%3A%2F%2Fwww.pin1.harvard.edu%2Fpin%2Fauthenticate%3F__authen_application%3DFAS_AC_AUTHENTICATOR'
#req = requests.get(url)
#dom = web.Element(req.text)
#Handles all the browser details
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
#self.browser = mechanize.Browser(factory=mechanize.RobustFactory())
#Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.open(url)
#Select First Form
#br.select_form(nr=1)
#br['username'] = '40839852'
#print list(br.forms())[0]
for form in br.forms():
print "Form name:", form.name
print form
break
br.select_form(name= formname)
br[searchname] = term
res = br.submit()
content = res.read()
dom = web.Element(content)
TRACEBACK
ParseError: unexpected '/' char in declaration
---> 32 for form in br.forms():
33 print "Form name:", form.name
34 print form
更新 - 基于PACO的建议我加了......但我仍然得到追溯。 Python unable to retrieve form with urllib or mechanize
beg = re.search(t, res.read()).span()[1]
res.set_data(res.get_data()[beg:])
br.set_response(response)
br.select_form(nr=0)
<ipython-input-25-bd1b73406b45> in <module>()
28 br.set_response(response)
29
---> 30 br.select_form(nr=0)
31
32
ParseError: unexpected '-' char in declaration
答案 0 :(得分:1)
这就是我在代码中选择第一个表单的方式。
br.select_form(nr=0)
#Form fields to populate
br.form['username'] = username
br.form['password'] = password
#Submit the login form
br.submit()
根据您的需要进行修改。 “nr = 0”可能就是你要找的东西。
但问题是DOCTYPE。我测试了以下内容,并将其剥离。
html = br.response().get_data().replace('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd >', '')
response = mechanize.make_response(
html, [("Content-Type", "text/html")],
url, 200, "OK")
br.set_response(response)
我直接从Mechanize FAQ.
中取得了这个