我正在尝试使用以下代码在python中向http://apps.fas.usda.gov/esrquery/esrq.aspx提交表单:
import urllib
from bs4 import BeautifulSoup
import mechanize
import datetime
today = datetime.date.today().strftime("%m/%d/%Y")
url = 'http://apps.fas.usda.gov/esrquery/esrq.aspx'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
viewstate = soup.find('input', {'id' : '__VIEWSTATE'})['value']
eventval = soup.find('input', {'id' : '__EVENTVALIDATION'})['value']
br = mechanize.Browser(factory=mechanize.RobustFactory())
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.open(url)
# fill form
br.select_form("aspnetForm")
br.form.set_all_readonly(False)
br.form['__EVENTTARGET'] = ''
br.form['__EVENTARGUMENT'] = ''
br.form['__LASTFOCUS'] = ''
br.form['__VIEWSTATE'] = viewstate
br.form['__VIEWSTATEGENERATOR'] = '41AA5B91'
br.form['__EVENTVALIDATION'] = eventval
br.form['ctl00$MainContent$lbCommodity'] = ['401']
br.form['ctl00$MainContent$lbCountry'] = ['0:0']
br.form['ctl00$MainContent$ddlReportFormat'] = ['10']
br.find_control('ctl00$MainContent$cbxSumGrand').items[0].selected = True
br.find_control('ctl00$MainContent$cbxSumKnown').items[0].selected = False
br.form['ctl00$MainContent$rblOutputType'] = ['2']
br.form['ctl00$MainContent$tbStartDate'] = '01/01/1999'
br.form['ctl00$MainContent$ibtnStart'] = ''
br.form['ctl00$MainContent$tbEndDate'] = today
br.form['ctl00$MainContent$ibtnEnd'] = ''
br.form['ctl00$MainContent$rblColumnSelection'] = ['regular']
response = br.submit()
我得到的回复基本上只是网站的HTML代码,表格按预期填写。但是,我期待一个excel文件(因为我选择了OutputType值为2)
我认为我在提交方面遗漏了一些东西。有人可以解释一下我所缺少的东西吗?
答案 0 :(得分:0)
你很亲密,但提交后你需要做的更多。在这种情况下,只需添加:
doc = response.read()
ofile = '<your path>'
with open(ofile, 'w') as f:
f.write(doc)
我目前无法在您的网站上对此进行测试,所以我只是假设您的所有设置都是正确的。我只有Python 3在工作,而机械化只适用于2.x.无论如何,这通常是您想要检索此类输出的方式。