Question

首先，代码：

import requests
from bs4 import BeautifulSoup

url = 'https://stackoverflow.com/questions/tagged/python'
payload = {'pageSize': '5'}
r = requests.get(url, params=payload)
content = r.text

soup = BeautifulSoup(content, 'html.parser')
questions = soup.select('div#questions h3')

print(r.url)
print(len(questions))

输出

https://stackoverflow.com/questions/tagged/python?pageSize=5
50

预期产量

https://stackoverflow.com/questions/tagged/python?pageSize=5
5

在发出上述请求时，stackoverflow.com似乎半忽略了pageSize参数。我说半忽略，因为r.text确实包含'pageSize = 5 “ />”，表示它知道该参数。但它返回50个问题。如果您直接转到https://stackoverflow.com/questions/tagged/python?pageSize=5，则只会返回5个问题。

有没有办法让stackoverflow.com尊重通过http请求发送的URL参数？

Answer 1

问题是您的User-Agent，所以requests标头看起来像这样

{'User-Agent': 'python-requests/2.19.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

请注意您的User-Agent是'python-requests'，因此StackOverflow忽略了查询参数，因为它知道它不是来自真实的浏览器，因此要解决此问题，您可以在发出类似请求的过程中简单地传递空标头这个

requests.get(url, headers='')

与Python请求一起发送时，URL参数将被忽略

1 个答案: