我正在尝试废弃此页面: http://www.nitt.edu/prm/nitreg/ShowRes.aspx
以下是代码:
import urllib
from bs4 import BeautifulSoup
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Origin': 'http://www.indiapost.gov.in',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17',
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'http://www.nitt.edu/prm/nitreg/ShowRes.aspx',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}
class MyOpener(urllib.FancyURLopener):
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'
myopener = MyOpener()
url = 'http://www.nitt.edu/prm/nitreg/ShowRes.aspx'
# first HTTP request without form data
f = myopener.open(url)
soup = BeautifulSoup(f)
# parse and retrieve two vital form values
viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"})
eventvalidation = soup.findAll("input", {"type": "hidden", "name": "__EVENTVALIDATION"})
print viewstate[0]['value']
formData = (
('__EVENTVALIDATION', eventvalidation),
('__VIEWSTATE', viewstate),
('__VIEWSTATEENCRYPTED',''),
('TextBox1', '106110006'),
('Button1', 'Show'),
)
encodedFields = urllib.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)
try:
# actually we'd better use BeautifulSoup once again to
# retrieve results(instead of writing out the whole HTML file)
# Besides, since the result is split into multipages,
# we need send more HTTP requests
fout = open('tmp.html', 'w')
except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
我一直收到服务器错误: 来源错误:
在执行当前Web请求期间生成了未处理的异常。可以使用下面的异常堆栈跟踪来识别有关异常的起源和位置的信息。
堆栈追踪:
[FormatException: Invalid character in a Base-64 string.]
System.Convert.FromBase64String(String s) +0
System.Web.UI.LosFormatter.Deserialize(String input) +25
System.Web.UI.Page.LoadPageStateFromPersistenceMedium() +101
[HttpException (0x80004005): Invalid_Viewstate
Client IP: 10.0.0.166
Port: 51915
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17
ViewState: [<input name="__VIEWSTATE" type="hidden" value="dDwtMTM3NzI1MDM3O3Q8O2w8aTwxPjs+O2w8dDw7bDxpPDE+O2k8Mj47PjtsPHQ8cDxwPGw8VmlzaWJsZTs+O2w8bzxmPjs+Pjs+O2w8aTwxPjtpPDM+Oz47bDx0PDtsPGk8Mz47PjtsPHQ8O2w8aTwwPjs+O2w8dDw7bDxpPDE+Oz47bDx0PEAwPDs7Ozs7Ozs7Ozs+Ozs+Oz4+Oz4+Oz4+O3Q8cDxwPGw8VmlzaWJsZTs+O2w8bzxmPjs+Pjs+Ozs+Oz4+O3Q8O2w8aTw5PjtpPDExPjs+O2w8dDxwPHA8bDxWaXNpYmxlOz47bDxvPGY+Oz4+Oz47Oz47dDx0PHA8cDxsPFZpc2libGU7PjtsPG88Zj47Pj47Pjs7Pjs7Pjs+Pjs+Pjs+Pjs+zHrNhAd1tTLXbBUyAJRtS6omUc0="/>]
Http-Referer:
Path: /prm/nitreg/ShowRes.aspx.]
System.Web.UI.Page.LoadPageStateFromPersistenceMedium() +447
System.Web.UI.Page.LoadPageViewState() +18
System.Web.UI.Page.ProcessRequestMain() +447
Base-64字符串中的字符无效。 有什么问题?
答案 0 :(得分:1)
您正在使用ViewState输入对象,而不是值。
ViewState: [<input name="__VIEWSTATE" type="hidden" value="dDwtMTM3NzI1MDM3O3Q8O2w8aTwxPjs+O2w8dDw7bDxpPDE+O2k8Mj47PjtsPHQ8cDxwPGw8VmlzaWJsZTs+O2w8bzxmPjs+Pjs+O2w8aTwxPjtpPDM+Oz47bDx0PDtsPGk8Mz47PjtsPHQ8O2w8aTwwPjs+O2w8dDw7bDxpPDE+Oz47bDx0PEAwPDs7Ozs7Ozs7Ozs+Ozs+Oz4+Oz4+Oz4+O3Q8cDxwPGw8VmlzaWJsZTs+O2w8bzxmPjs+Pjs+Ozs+Oz4+O3Q8O2w8aTw5PjtpPDExPjs+O2w8dDxwPHA8bDxWaXNpYmxlOz47bDxvPGY+Oz4+Oz47Oz47dDx0PHA8cDxsPFZpc2libGU7PjtsPG88Zj47Pj47Pjs7Pjs7Pjs+Pjs+Pjs+Pjs+zHrNhAd1tTLXbBUyAJRtS6omUc0="/>]
您的formData
应该是:
formData = (
('__EVENTVALIDATION', eventvalidation[0]['value']),
('__VIEWSTATE', viewstate[0]['value']),
('__VIEWSTATEENCRYPTED',''),
('TextBox1', '106110006'),
('Button1', 'Show'),
)
请注意,您的eventvalidation值存在同样的问题,我也修复了它。
编辑:
该页面中不存在__EVENTVALIDATION。您只需从__EVENTVALIDATION
删除formData
。
formData = (
('__VIEWSTATE', viewstate[0]['value']),
('__VIEWSTATEENCRYPTED',''),
('TextBox1', '106110006'),
('Button1', 'Show'),
)