我一直试图利用机械化来削减国会财务披露reports;表单提交成功,但我找不到任何搜索结果。我的脚本如下:
br = Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.open('http://clerk.house.gov/public_disc/financial-search.aspx')
br.select_form(name='aspnetForm')
br.set_all_readonly(False)
br['filing_year'] = ['2008']
response = br.submit(name='search_btn')
html = response.read()
我是新手,并希望对此有任何更正/建议。谢谢!
答案 0 :(得分:0)
这是一个替代解决方案,在selenium
工具的帮助下涉及真正的浏览器。
from selenium import webdriver
from selenium.webdriver.support.select import Select
# initialize webdriver instance and visit url
url = "http://clerk.house.gov/public_disc/financial-search.aspx"
browser = webdriver.Firefox()
browser.get(url)
# find select tag and select 2008
select = Select(browser.find_element_by_id('ctl00_cphMain_txbFiling_year'))
select.select_by_value('2008')
# find "search" button and click it
button = browser.find_element_by_id('ctl00_cphMain_btnSearch')
button.click()
# display results
table = browser.find_element_by_id('search_results')
for row in table.find_elements_by_tag_name('tr')[1:-1]:
print [cell.text for cell in row.find_elements_by_tag_name('td')]
# close the browser
browser.close()
打印:
[u'ABERCROMBIE, HON.NEIL', u'HI01', u'2008', u'FD Amendment']
[u'ABERCROMBIE, HON.NEIL', u'HI01', u'2008', u'FD Original']
[u'ACKERMAN, HON.GARY L.', u'NY05', u'2008', u'FD Amendment']
[u'ACKERMAN, HON.GARY L.', u'NY05', u'2008', u'FD Amendment']
...