我正在使用这个网站(http://gasbuddy.com/)收集汽油价格。基本上,我想编写一个python脚本,将邮政编码输入到页面顶部的搜索框中,然后从下一页中删除结果。我陷入了第一步,即将我想要的邮政编码输入到表单中。这就是我到目前为止所做的:
from mechanize import Browser
import urllib2
br = Browser()
baseURL = "http://www.gasbuddy.com/"
br.open(baseURL)
zipcode = "20010"
forms = [f for f in br.forms()]
print forms[0]
control = forms[0].find_control("ctl00$Content$GBZS$txtZip")
forms[0]["ctl00$Content$GBZS$txtZip"] = "20010"
br.form = forms[0]
page = br.submit()
content = page.read()
br.geturl()
不幸的是,当我提交表单时,br.geturl()告诉我,我没有访问我想要的页面(网址应该看起来像“http://www.washingtondcgasprices.com/index.aspx?area=Washington%20-%20NE&area=Washington%20-%20NW&area=Washington%20-%20SE&area=Washington%20-%20SW”)
如果您有任何指导我会很感激。谢谢!
答案 0 :(得分:1)
您可以使用Selenium:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
baseURL = "http://www.gasbuddy.com/"
browser = webdriver.Firefox()
zipcode = "20010"
browser.get(baseURL)
elem = browser.find_element_by_id("ctl00_Content_GBZS_txtZip").send_keys(zipcode)
elem = browser.find_element_by_id("ctl00_Content_GBZS_btnSearch").click()
如果你想坚持机械化,你可能想稍微调整你的浏览器。但我仍然怀疑这是在那里杀死你的JavaScript。然后解决方案是"read the javascript yourself and simulate with mechanize what it would be doing"。