urllib请求无法在单独的计算机上工作

时间:2019-09-03 13:34:16

标签: python http urllib fiddler

我正在使用fiddler来跟踪HTTP请求。

这使我可以使用urllib自动填充表单。

在我正在使用的Jupyter笔记本上运行正常,然后将其交给同事进行试用。它在他的计算机上不起作用。

我对此是全新的,所以也许我犯了一个简单的错误。我认为可能与Cookie标头有关?

我正在将姓名,姓氏和邮政编码填写为在线表格。

请求:

import urllib.request  as urllib2

req = urllib2.Request("https://carlowcoco.checktheregister.ie/publicpages/Results.aspx")

添加标题:

req.add_header("Connection", "keep-alive")
req.add_header("Cache-Control", "max-age=0")
req.add_header("Origin", "https://carlowcoco.checktheregister.ie")
req.add_header("Upgrade-Insecure-Requests", "1")
req.add_header("Content-Type", "application/x-www-form-urlencoded")
req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 OPR/62.0.3331.116")
req.add_header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
req.add_header("Referer", "https://carlowcoco.checktheregister.ie/publicpages/ereg.aspx?CID=4&uiLang=en-GB")
req.add_header("Accept-Encoding", "gzip, deflate, br")
req.add_header("Accept-Language", "en-US,en;q=0.9")
req.add_header("Cookie", "_ga=GA1.2.1485303330.1563803355; _fbp=fb.1.1563803355623.389471504; _gid=GA1.2.1242949638.1567500110; ASP.NET_SessionId_eReg=wbyf1iuvothtmdr0zxq4ypnv; _gat=1")

发送信息:

firstname='john'
lastname='smith'
zipcode='abc123'

# this is where we add the name, surname and zipcode
body = f"__LASTFOCUS=&__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwULLTExODA1MzM2NzFkZI2Y9Vj1N4c71dOJShLXen0Q8nT0&__VIEWSTATEGENERATOR=1627BCCD&__PREVIOUSPAGE=o3Y5pVByrKh5ylQa3zb19RrpXCBCTakCQLkYw24qRyH07uZC4V8-00fT-aZjmROM9Gnkny1RyjaEBGfxfBR95RnY9Dn0zJEhObiGTquHfVvYnOZx0&__EVENTVALIDATION=%2FwEWBwKFwaWxBQLp48u6DgK95LDpBAK62djbDgLthcGDBQL0mu%2BYCwK83r2cAZJf50Jf%2F9CI7cXegRb5oL0hvtD1&ctl00%24MainContent%24TextBoxPostcode={zipcode}&ctl00%24MainContent%24TextBoxFirstName={firstname}&ctl00%24MainContent%24TextBoxSurname={surname}&ctl00%24MainContent%24FormSubmit=Submit"

# convert to bytes object
body = body.encode('utf-8')

# send request and save to response
response = urllib2.urlopen(req, body)

# read response and convert to string
page = response.read()

这不是返回URL或HTTP错误,而是返回包含文本<b>An ERROR has occurred. Please try again. If the issue persists, please try again later.</b>\的HTML。

那么,为什么这在我的计算机上有效,但对我的同事却不起作用?

还有,这样做有更好的方法吗?标头看起来很乱。我觉得可能有一种自动整理表格的整理方法。

1 个答案:

答案 0 :(得分:1)

如上所述,问题是cookie中的会话ID。您的同事需要用自己的会话ID替换它才能正常工作。您应该能够获得一个新的。