我试图使用Python请求来发送POST请求来搜索此ASP.NET网站的搜索结果。即使我使用GET请求来获取requestverificationtoken并将其包含在我的标题中,我也得到这样的回复:
{"Token":"Y2VgsmEAAwA","Link":"/search/Y2VgsmEAAwA/"}
这不是有效的链接。它是我的POST请求中包含的没有定义的到达数据或区域的总搜索结果。我错过了什么?我该如何抓取这样的网站,为网址生成(会话?)ID?
非常感谢你们所有人!
我的python脚本:
import json
import requests
from bs4 import BeautifulSoup
r = requests.Session()
# GET request
gr = r.get("http://www.feline.dk")
bsObj = BeautifulSoup(gr.text,"html.parser")
auth_string = bsObj.find("input", {"name": "__RequestVerificationToken"})['value']
#print(auth_string)
#print(gr.url)
# POST request
search_request = {
"Geography.Geography":"Danmark",
"Geography.GeographyLong=":"Danmark (Ferieområde)",
"Geography.Id":"da509992-0830-44bd-869d-0270ba74ff62",
"Geography.SuggestionId": "",
"Period.Arrival":"16-1-2016",
"Period.Duration":7,
"Period.ArrivalCorrection":"false",
"Price.MinPrice":None,
"Price.MaxPrice":None,
"Price.MinDiscountPercentage":None,
"Accommodation.MinPersonNumber":None,
"Accommodation.MinBedrooms":None,
"Accommodation.NumberOfPets":None,
"Accommodation.MaxDistanceWater":None,
"Accommodation.MaxDistanceShopping":None,
"Facilities.SwimmingPool":"false",
"Facilities.Whirlpool":"false",
"Facilities.Sauna":"false",
"Facilities.InternetAccess":"false",
"Facilities.SatelliteCableTV":"false",
"Facilities.FireplaceStove":"false",
"Facilities.Dishwasher":"false",
"Facilities.WashingMachine":"false",
"Facilities.TumblerDryer":"false",
"update":"true"
}
payload = {
"searchRequestJson": json.dumps(search_request),
}
header ={
"Accept":"application/json, text/html, */*; q=0.01",
"Accept-Encoding":"gzip, deflate",
"Accept-Language":"da-DK,da;q=0.8,en-US;q=0.6,en;q=0.4",
"Connection":"keep-alive",
"Content-Length":"720",
"Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
"Cookie":"ASP.NET_SessionId=ebkmy3bzorzm2145iwj3bxnq; __RequestVerificationToken=" + auth_string + "; aid=382a95aab250435192664e80f4d44e0f; cid=google-dk; popout=hidden; __utmt=1; __utma=1.637664197.1451565630.1451638089.1451643956.3; __utmb=1.7.10.1451643956; __utmc=1; __utmz=1.1451565630.1.1.utmgclid=CMWOra2PhsoCFQkMcwod4KALDQ|utmccn=(not%20set)|utmcmd=(not%20set)|utmctr=(not%20provided); BNI_Feline.Web.FelineHolidays=0000000000000000000000009b84f30a00000000",
"Host":"www.feline.dk",
"Origin":"http://www.feline.dk",
#"Referer":"http://www.feline.dk/search/Y2WZNDPglgHHXpe2uUwFu0r-JzExMYi6yif5KNswMDBwMDAAAA/",
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36",
"X-Requested-With":"XMLHttpRequest"
}
gr = r.post(
url = 'http://www.feline.dk/search',
data = payload,
headers = header
)
#print(gr.url)
bsObj = BeautifulSoup(gr.text,"html.parser")
print(bsObj)
答案 0 :(得分:1)
在多次尝试之后,我发现您的搜索请求格式错误(需要是URL编码而不是JSON),并且标题中的cookie信息被覆盖(只需让会话完成工作)。
我简化了代码,我得到了理想的结果
r = requests.Session()
# GET request
gr = r.get("http://www.feline.dk")
bsObj = BeautifulSoup(gr.text,"html.parser")
auth_string = bsObj.find("input", {"name": "__RequestVerificationToken"})['value']
# POST request
search_request = "Geography.Geography=Hou&Geography.GeographyLong=Hou%2C+Danmark+(Ferieomr%C3%A5de)&Geography.Id=847fcbc5-0795-4396-9318-01e638f3b0f6&Geography.SuggestionId=&Period.Arrival=&Period.Duration=7&Period.ArrivalCorrection=False&Price.MinPrice=&Price.MaxPrice=&Price.MinDiscountPercentage=&Accommodation.MinPersonNumber=&Accommodation.MinBedrooms=&Accommodation.NumberOfPets=&Accommodation.MaxDistanceWater=&Accommodation.MaxDistanceShopping=&Facilities.SwimmingPool=false&Facilities.Whirlpool=false&Facilities.Sauna=false&Facilities.InternetAccess=false&Facilities.SatelliteCableTV=false&Facilities.FireplaceStove=false&Facilities.Dishwasher=false&Facilities.WashingMachine=false&Facilities.TumblerDryer=false"
gr = r.post(
url = 'http://www.feline.dk/search/',
data = search_request,
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
)
print(gr.url)
结果:
http://www.feline.dk/search/Y2U5erq-ZSr7NOfJEozPLD5v-MZkw8DAwMHAAAA/
答案 1 :(得分:0)
感谢Kantium的回答,就我而言,我发现RequestVerificationToken实际上是在页面内的JS脚本中生成的。
1-调用生成代码的第一页,就我而言,它在HTML中返回了类似的内容:
<script>
Sys.Net.WebRequestManager.add_invokingRequest(function (sender, networkRequestEventArgs) {
var request = networkRequestEventArgs.get_webRequest();
var headers = request.get_headers();
headers['RequestVerificationToken'] = '546bd932b91b4cdba97335574a263e47';
});
$.ajaxSetup({
beforeSend: function (xhr) {
xhr.setRequestHeader("RequestVerificationToken", '546bd932b91b4cdba97335574a263e47');
},
complete: function (result) {
console.log(result);
},
});
</script>
2-抓取RequestVerificationToken代码,然后将其与set-cookie中的cookie一起添加到您的请求中。
let resp_setcookie = response.headers["set-cookie"];
let rege = new RegExp(/(?:RequestVerificationToken", ')(\S*)'/);
let token = rege.exec(response.body)[1];
我实际上将它们存储在全局变量中,然后在我的Nodejs Request中将其添加到请求对象中:
headers.Cookie = gCookies.cookie;
headers.RequestVerificationToken = gCookies.token;
这样最终请求看起来像这样:
请记住,您可以监视使用以下方式发送的请求:
require("request-debug")(requestpromise);
祝你好运!