如何使用mechanize和bs4更改网页的多个选项

时间:2015-09-27 02:31:03

标签: python web-scraping mechanize

我必须抓取here中提供的所有选项。使用mechanize我选择了前两个控件(报表类型和语言)。现在有三个下拉列表。第二个依赖于第一个而第三个依赖于第二个。我怎么解决它。前两个字段的起始代码如下所示

import mechanize
from bs4 import BeautifulSoup   
br = mechanize.Browser()

url="http://ceojk.nic.in/ElectionPDF/Main.aspx"
response = br.open(url)
br.select_form(name="Form1")
control_1 = br.form.find_control("RadioButtonList1")
control_2 = br.form.find_control("RadioButtonList2")
submit = br.form.find_control("Button1")

br[control_1.name]=["PS Wise Report"]
br[control_2.name]=["English"]
response = br.submit()
soup=BeautifulSoup(response,'lxml')
for item in soup.find_all('option'):
    print item['value']

1 个答案:

答案 0 :(得分:1)

好的,调试非常令人兴奋(你无法想象我在试图解决它时尝试和学习了多少东西)。

这是模拟浏览器中的行为的工作代码,逐步选择第一个区,AC和PS(只传递["1"]值 - 您可能需要改进它 - 例如,阅读选项并添加选项名称 - >值图:)

import mechanize
from bs4 import BeautifulSoup

br = mechanize.Browser()

url = "http://ceojk.nic.in/ElectionPDF/Main.aspx"
response = br.open(url)

br.select_form(name="Form1")
br["RadioButtonList1"] = ["PS Wise Report"]
br["RadioButtonList2"] = ["English"]
br.submit()

# getting ACs
br.select_form(name="Form1")
br["DistlistP"] = ["1"]
br.submit(name="BtnPs")

# getting PSes
br.select_form(name="Form1")
br["AclistP"] = ["1"]
br.submit(name="BtnPs")

# getting report
br.select_form(name="Form1")
br["PslistP"] = ["1"]
response = br.submit(name="BtnPs")

soup = BeautifulSoup(response)
print(soup.find(id="Pnlfile"))

最后,它会打印"文件"的HTML代码。阻止出现在浏览器的右侧。