我是python的新手,尝试通过.aspx表单进行一些搜索。当我执行此代码时,出现错误。我正在使用Python 3.4.2。
import urllib
from bs4 import BeautifulSoup
import urllib.request
from urllib.request import urlopen
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Origin': 'http://www.indiapost.gov.in',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17',
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'http://www.indiapost.gov.in/pin/',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}
class MyOpener(urllib.request.FancyURLopener):
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'
myopener = MyOpener()
url = 'http://legistar.council.nyc.gov/Legislation.aspx'
# first HTTP request without form data
f = myopener.open(url)
soup = BeautifulSoup(f)
#vstate = soup.select("#__VSTATE")[0]['value']
viewstate = soup.select("#__VIEWSTATE")[0]['value']
eventvalidation = soup.select("#__EVENTVALIDATION")[0]['value']
formFields = (
(r'__VSTATE', r''),
(r'__VIEWSTATE', viewstate),
(r'__EVENTVALIDATION', eventvalidation),
(r'ctl00_RadScriptManager1_HiddenField', ''),
(r'ctl00_tabTop_ClientState', ''),
(r'ctl00_ContentPlaceHolder1_menuMain_ClientState', ''),
(r'ctl00_ContentPlaceHolder1_gridMain_ClientState', ''),
(r'ctl00$ContentPlaceHolder1$chkOptions$0', 'on'), # file number
(r'ctl00$ContentPlaceHolder1$chkOptions$1', 'on'), # Legislative text
(r'ctl00$ContentPlaceHolder1$chkOptions$2', 'on'), # attachement
(r'ctl00$ContentPlaceHolder1$txtSearch', 'york'), # Search text
(r'ctl00$ContentPlaceHolder1$lstYears', 'All Years'), # Years to include
(r'ctl00$ContentPlaceHolder1$lstTypeBasic', 'All Types'), #types to include
(r'ctl00$ContentPlaceHolder1$btnSearch', 'Search Legislation') # Search button itself
)
encodedFields = urllib.parse.urlencode(formFields)
# second HTTP request with form data
f = myopener.open(url, encodedFields)
try:
# actually we'd better use BeautifulSoup once again to
# retrieve results(instead of writing out the whole HTML file)
# Besides, since the result is split into multipages,
# we need send more HTTP requests
fout = open('tmp.html', 'wb')
except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
此脚本不返回任何结果。
如何让脚本搜索表单并返回结果?
答案 0 :(得分:0)
正如Andrei在评论中提到的那样,您将需要导入urllib,但是您可能会遇到其他代码问题,因为您正在对__VIEWSTATE
和__EVENTVALIDATION
进行硬编码。
Hui Zheng做了很好的解释,这让我弄清楚了,所以我只是link to his answer而不是试着解释它。