Question

我正在尝试使用 python 的请求模块从特定链接（请参阅下面的 python 代码）获取 JSON 响应。当我在 Firefox 的 RESTer 中测试链接（或只是将其复制到浏览器的地址栏中）时，它会返回应有的信息：

fetchJSON_comment98({"productAttr":null,"productCommentSummary":{"skuId":100020974898,"averageScore":5,"defaultGoodCount":0,"defaultGoodCountStr":"10��+"," commentCount":0,"commentCountStr":"10��+","goodCount":0,"goodCountStr":"2.1��+","goodRate":0.97,"goodRateShow":97,"generalCount":0 ,"generalCountStr":"200+","generalRate":0.02,"generalRateShow":2,"poorCoun ...（截断）

标题：

日期：格林威治标准时间 2021 年 6 月 9 日星期三 09:25:31
内容类型：text/html;charset=GBK
传输编码：分块
连接：关闭
变化：接受编码
设置-Cookie：JSESSIONID=502398ABD60D51F774B1E90EEF32F818.s1；路径=/ jwotest_product=99;域名=club.jd.com；到期时间 = 2021 年 6 月 16 日星期三 09:25:30 GMT；路径=/
服务器：jfe
严格传输安全：max-age=7776000

Firefox 的网络检查器中也显示了相同的内容： Firefox Network Inspector

但是当我从 python 3.7 尝试以下代码时：

from requests import Session

url = "https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98&productId=100020974898&score=0&sortType=6&page=0&pageSize=10&isShadowSku=0&fold=1"

headers = {"Host": "club.jd.com",
           "Pragma": "no-cache",
           "Cache-Control": "no-cache",
           "User-Agent": "Mozilla/5.0"}

s = Session()
resp = s.get(url=url, headers=headers)
print(resp.text)

我收到一个 HTTP 200 响应和一个空响应正文，其中包含以下响应标头：

'date' (1890118560096) = {tuple} <class 'tuple'>: ('Date', 'Wed, 09 Jun 2021 09:33:02 GMT')
'content-type' (1890118524080) = {tuple} <class 'tuple'>: ('Content-Type', 'text/html;charset=GBK')
'transfer-encoding' (1890118269376) = {tuple} <class 'tuple'>: ('Transfer-Encoding', 'chunked')
'connection' (1890118524464) = {tuple} <class 'tuple'>: ('Connection', 'close')
'vary' (1890118560376) = {tuple} <class 'tuple'>: ('Vary', 'Accept-Encoding')
'content-encoding' (1890118568528) = {tuple} <class 'tuple'>: ('Content-Encoding', 'gzip')
'server' (1890118560880) = {tuple} <class 'tuple'>: ('Server', 'jfe')
'strict-transport-security' (1890118632112) = {tuple} <class 'tuple'>: ('Strict-Transport-Security', 'max-age=7776000')

我曾尝试使用 CookieJar 添加 cookie 或从浏览器的响应中复制它或制作我自己的，但都没有奏效。尝试了 Stackoverflow 上列出的许多解决方案，但没有成功...

请帮帮我，我做错了什么？

Answer 1

问题在于用户代理标头。将标题更改为浏览器中的任何内容，代码就可以工作了。您可以在此处阅读有关用户代理标头格式的更多信息： https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

from requests import Session

url = "https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98&productId=100020974898&score=0&sortType=6&page=0&pageSize=10&isShadowSku=0&fold=1"

headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"}

s = Session()
resp = s.get(url=url, headers=headers)
print(resp.text)

从网络服务器获取 JSON 响应时，响应正文始终为空，但响应代码为 200

1 个答案: