我希望在此网站上发布POST请求:
http://web1.ncaa.org/stats/StatsSrv/careersearch
右边的表格有四个下拉列表。当我运行下面的代码时,“学校”固执地不会被选中。有一个隐藏的输入可能导致问题,但我无法解决它。页面上的javascript似乎没有效果,但我可能是错的。任何帮助表示赞赏:
#!/usr/bin/python
import urllib
import urllib2
url = 'http://web1.ncaa.org/stats/StatsSrv/careersearch'
values = {'searchOrg' : '30123','academicYear' : '2011','searchSport' : 'MBA','searchDiv' : '1'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
print the_page
答案 0 :(得分:2)
如您所料,您错过了一个隐藏的字段:doWhat = 'teamSearch'
(用于提交右侧的表单)。
使用这些请求值对我有用:
values = {'doWhat':'teamSearch', 'searchOrg' : '30123','academicYear' : '2011','searchSport' : 'MBA','searchDiv' : '1'}
答案 1 :(得分:0)
我使用了机械化:
import mechanize
from BeautifulSoup import BeautifulSoup
mech = mechanize.Browser()
mech.set_handle_robots(False)
response = mech.open('http://web1.ncaa.org/stats/StatsSrv/careersearch')
mech.select_form(nr=2)
mech.form['searchOrg'] = ['30123']
mech.form['academicYear'] = ['2011']
mech.form['searchSport'] = ['MBA']
mech.form['searchDiv'] = ['1']
mech.submit()
soup = BeautifulSoup(mech.response().read())
我知道机械化网站要求 searchOrg , academicYear , searchSport , searchDiv 清单表格。你一定要注意robots.txt。