我正在使用fiddler来跟踪HTTP请求。
这使我可以使用urllib自动填充表单。
在我正在使用的Jupyter笔记本上运行正常,然后将其交给同事进行试用。它在他的计算机上不起作用。
我对此是全新的,所以也许我犯了一个简单的错误。我认为可能与Cookie标头有关?
我正在将姓名,姓氏和邮政编码填写为在线表格。
请求:
import urllib.request as urllib2
req = urllib2.Request("https://carlowcoco.checktheregister.ie/publicpages/Results.aspx")
添加标题:
req.add_header("Connection", "keep-alive")
req.add_header("Cache-Control", "max-age=0")
req.add_header("Origin", "https://carlowcoco.checktheregister.ie")
req.add_header("Upgrade-Insecure-Requests", "1")
req.add_header("Content-Type", "application/x-www-form-urlencoded")
req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 OPR/62.0.3331.116")
req.add_header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
req.add_header("Referer", "https://carlowcoco.checktheregister.ie/publicpages/ereg.aspx?CID=4&uiLang=en-GB")
req.add_header("Accept-Encoding", "gzip, deflate, br")
req.add_header("Accept-Language", "en-US,en;q=0.9")
req.add_header("Cookie", "_ga=GA1.2.1485303330.1563803355; _fbp=fb.1.1563803355623.389471504; _gid=GA1.2.1242949638.1567500110; ASP.NET_SessionId_eReg=wbyf1iuvothtmdr0zxq4ypnv; _gat=1")
发送信息:
firstname='john'
lastname='smith'
zipcode='abc123'
# this is where we add the name, surname and zipcode
body = f"__LASTFOCUS=&__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwULLTExODA1MzM2NzFkZI2Y9Vj1N4c71dOJShLXen0Q8nT0&__VIEWSTATEGENERATOR=1627BCCD&__PREVIOUSPAGE=o3Y5pVByrKh5ylQa3zb19RrpXCBCTakCQLkYw24qRyH07uZC4V8-00fT-aZjmROM9Gnkny1RyjaEBGfxfBR95RnY9Dn0zJEhObiGTquHfVvYnOZx0&__EVENTVALIDATION=%2FwEWBwKFwaWxBQLp48u6DgK95LDpBAK62djbDgLthcGDBQL0mu%2BYCwK83r2cAZJf50Jf%2F9CI7cXegRb5oL0hvtD1&ctl00%24MainContent%24TextBoxPostcode={zipcode}&ctl00%24MainContent%24TextBoxFirstName={firstname}&ctl00%24MainContent%24TextBoxSurname={surname}&ctl00%24MainContent%24FormSubmit=Submit"
# convert to bytes object
body = body.encode('utf-8')
# send request and save to response
response = urllib2.urlopen(req, body)
# read response and convert to string
page = response.read()
这不是返回URL或HTTP错误,而是返回包含文本<b>An ERROR has occurred. Please try again. If the issue persists, please try again later.</b>\
的HTML。
那么,为什么这在我的计算机上有效,但对我的同事却不起作用?
还有,这样做有更好的方法吗?标头看起来很乱。我觉得可能有一种自动整理表格的整理方法。
答案 0 :(得分:1)
如上所述,问题是cookie中的会话ID。您的同事需要用自己的会话ID替换它才能正常工作。您应该能够获得一个新的。