使用查询字符串发送get请求,但requests
正在剥离查询字符串。我尝试使用urllib.parse.urlencode()
对其进行编码,但结果与requests
相同。任何帮助请...
import requests
url = 'https://www.booking.com/searchresults.html'
params = {
'checkin_monthday': '19',
'checkin_year': '2018',
'checkout_month': '4',
'checkout_monthday': '20',
'checkout_year': '2018',
'class_interval': 1,
'dest_id': -1022488,
'dest_type': 'city',
'dtdisc': 0,
'from_sf': 1,
'group_adults': 2,
'group_children': 0,
'inac': 0,
'index_postcard': 0,
'label': 'gen1',
'label_click': 'undef',
'no_rooms': 1,
'offset': 0,
'postcard': 0,
'raw_dest_type': 'city',
'room1': 'A,A',
'sb_price_type': 'total',
'sb_travel_purpose': 'business',
'src': 'index',
'src_elem': 'sb',
'ss': 'Pokhara',
'ss_all': 0,
'ssb': 'empty',
'sshis': 0,
'ssne': 'Pokhara',
'ssne_untouched': 'Pokhara',
}
# import urllib
# formatted_query_string = urllib.parse.urlencode(payload)
# url = url + '?' + formatted_query_string
r = requests.get(url, params=params)
print(r.url)
# output
# https://www.booking.com/searchresults.html?dest_id=-1022488;est_type=city;ss=Pokhara
答案 0 :(得分:0)
您的代码很好,无需使用urllib
。您获得“剥离”网址的原因是,这不是您要查找的初始网址。
如果您检查requests
source code,您会发现r.url
Final URL location of Response
因此r.url
不您请求的网址,这是您(最终)重定向到的网址。你可以做一个简单的测试:
from requests import Request, Session
url = 'https://www.booking.com/searchresults.html' # your url
params = { # I intentionally shortened this dict for testing purposes
'checkin_monthday': '19',
'checkin_year': '2018',
'checkout_monthday': '20',
'checkout_year': '2018',
}
req = Request('GET', url, params=params)
prepped = req.prepare()
print(prepped.url) # you send this URL ...
s = Session()
resp = s.send(prepped)
print(resp.url) # ... but you are redirected to this URL (same as your r.url)
输出:
https://www.booking.com/searchresults.html?checkin_year=2018&checkin_monthday=19&checkout_monthday=20&checkout_year=2018
https://www.booking.com/
这里会发生什么:
HTTP/1.1 301 Moved Permanently
进行响应。可以在标题中找到重定向位置:Location: https://www.booking.com/searchresults.html?dest_id=-1022488;est_type=city;ss=Pokhara
。requests
收到此回复并转到此位置。位置网址已设置为r.url
属性。这就是初始网址和最终网址不同的原因。