Mechanize(Python) - 表单提交有问题

时间:2014-03-08 23:24:18

标签: python web-scraping mechanize

我正在尝试使用Python的Mechanize库做一些非常简单的事情。我想去:JobSearch“> http://careers.force.com/jobs/ts2_JobSearch,从下拉列表中选择都柏林爱尔兰,然后点击回车。

我为此编写了一个非常简短的Python脚本,但由于某些原因,当我运行它时,它返回默认搜索页面的HTML而不是选择位置后生成的搜索页面(都柏林爱尔兰)和打进入。我不知道出了什么问题:

import mechanize

link = "http://careers.force.com/jobs/ts2__JobSearch"

br = mechanize.Browser()
br.open(link)
br.select_form('j_id0:j_id1:atsForm' )
br.form['j_id0:j_id1:atsForm:j_id38:1:searchCtrl'] =  ["Ireland - Dublin"]

response = br.submit()

newsite = response.read()

1 个答案:

答案 0 :(得分:0)

如果您仍然遇到此问题,或者如果没有,以防其他人在将来遇到此问题....

我查看了当您手动选择某些内容并为您编写了一个函数时,您的浏览器发送的postdata将通过手动执行带有urllib.urlencoded数据的POST操作将您带到所需的页面。欢呼声。

import mechanize,cookielib,urllib

def get_search(html,controls):
    #viewstate
    s=re.search('ViewState" value="', html).span()[1]
    e=re.search('"',html[s:]).span()[0]+s
    state=html[s:e]
    #viewstateversion
    s=re.search('ViewStateVersion', html).span()[1]
    s=s+re.search('value="', html[s:]).span()[1]
    e=re.search('"', html[s:]).span()[0]+s
    version=html[s:e]
    #viewstatemac
    s=re.search('ViewStateMAC',html).span()[1]
    s=s+re.search('value=\"',html[s:]).span()[1]
    e=re.search('"',html[s:]).span()[0]+s
    mac=html[s:e]
    return {controls[0]:controls[0], controls[1]:'',controls[2]:'Ireland - Dublin', controls[3]:'Search','com.salesforce.visualforce.ViewState':state,'com.salesforce.visualforce.ViewStateVersion':version,'com.salesforce.visualforce.ViewStateMAC':mac}

#Define variables and create a mechanize browser
link = "http://careers.force.com/jobs/ts2__JobSearch"
br = mechanize.Browser()
cj=cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.open(link)

#get the html data
html=br.response().read()

#get the control names from the correct form
br.select_form(nr=1)
controls=[control.name for control in br.form.controls]

#run function with html and control names list as parameters and run urllib.urlencode on what gets returned
postdata=urllib.urlencode(get_search(br.response().read(), controls))

#go to the webpage again but this time also submit the encoded data
br.open(link, postdata)

#There Ya Go
print br.response().read()