需要在网站上进行搜索
url = r'http://www.cpso.on.ca/docsearch/'
这是一个aspx页面(我从昨天开始这次徒步旅行,对于noob问题抱歉)
使用BeautifulSoup,我可以像这样得到__VIEWSTATE和__EVENTVALIDATION:
viewstate = soup.find('input', {'id' : '__VIEWSTATE'})['value']
eventval = soup.find('input', {'id' : '__EVENTVALIDATION'})['value']
,标题可以像这样设置:
headers = {'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13',
'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
'Content-Type': 'application/x-www-form-urlencoded'}
如果你去网页,我真正想传递的唯一值是名字和姓氏......
LN = "smith"
FN = "a"
data = {"__VIEWSTATE":viewstate,"__EVENTVALIDATION":ev,
"ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtLastName":LN,
"ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtFirstName":FN}
所以把它们全部放在一起就像这样:
import urllib
import urllib2
import urlparse
import BeautifulSoup
url = r'http://www.cpso.on.ca/docsearch/'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup.BeautifulSoup(html)
viewstate = soup.find('input', {'id' : '__VIEWSTATE'})['value']
ev = soup.find('input', {'id' : '__EVENTVALIDATION'})['value']
headers = {'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13',
'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
'Content-Type': 'application/x-www-form-urlencoded'}
LN = "smith"
FN = "a"
data = {"__VIEWSTATE":viewstate,"__EVENTVALIDATION":ev,
"ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtLastName":LN,
"ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtFirstName":FN}
data = urllib.urlencode(data)
request = urllib2.Request(url,data,headers)
response = urllib2.urlopen(request)
newsoup = BeautifulSoup.BeautifulSoup(response)
for i in newsoup:
print i
问题是它似乎并没有给我结果......不知道我是否需要为表格中的每个文本框提供每个值或者什么......也许我只是没有正确地做到这一点。无论如何,只是希望有人可以让我直截了当。我以为我有它,但我希望看到医生和联系信息列表。
我非常感谢任何见解,我之前使用过beautifulsoup,但我认为我的问题只是发送请求并在数据部分中提供适量的信息。
谢谢!
答案 0 :(得分:5)
接受了来自@pguardiario的建议,并采用了机械化路线......更加简单
import mechanize
url = r'http://www.cpso.on.ca/docsearch/'
request = mechanize.Request(url)
response = mechanize.urlopen(request)
forms = mechanize.ParseResponse(response, backwards_compat=False)
response.close()
form = forms[0]
form['ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtLastName']='Smith'
form['ctl00$ContentPlaceHolder1$MainContentControl1$ctl00$txtPostalCode']='K1H'
print mechanize.urlopen(form.click()).read()
我距离完成还有很长的路要走,但这让我更进一步。