所以我需要使用python抓取this网站,但是在尝试提交表单时我发现了一个问题。我得到的回复是与表单相同的页面,而不是提交表单后的结果。我尝试使用请求library/ mechanize / urllib
。
包含请求的代码:
url = "http://www.justiceservices.gov.mt/courtservices/Judgements/search.aspx?func=selected"
payload = {'ctl00$ContentPlaceHolderMain$search_selected_panel$tb_date_from':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$tb_date_to':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$dd_court':108,
'ctl00$ContentPlaceHolderMain$search_selected_panel$dd_judiciary':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$tb_litigant1':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$tb_litigant2':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$tb_keywords':'',
'ctl00$ContentPlaceHolderMain$search_selected_panel$keywords':'rb_keywords_matching_all',
'ctl00$ContentPlaceHolderMain$search_selected_panel$bt_search':'Search',
'ctl00$ContentPlaceHolderMain$search_selected_panel$result_count_panel$dd_result_count':10}
headers = {'content-type': 'application/x-www-form-urlencoded'}
r = requests.post(url,payload,allow_redirects=True)
print r.headers
print r.text
我是否需要发布其他数据?或者我的方法对于表格的类型是错误的。该网站使用网络表格。
答案 0 :(得分:0)
如果您查看请求来源,特别是https://github.com/kennethreitz/requests/blob/master/requests/api.py#L80,您会看到该帖子忽略了args。没有时间进行测试,您可能需要这样做:
r = requests.post(url, data=payload, allow_redirects=True