我缺少哪些标题来抓取NBA Stats数据?

时间:2020-01-23 21:04:10

标签: html json web-scraping powerbi host

几天前,在Power BI中,我能够创建一个网络查询,该查询使我可以从NBA Player Stats中提取JSON数据,而无需使用任何标题。从今天开始,我注意到该查询不再有效;我收到以下错误消息:

DataSource.Error: The underlying connection was closed. An unexpected error occurred on a receive.
Details: https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=&Weight=

在相关说明中,我曾经能够使用https://stats.nba.com/作为Referer标头从NBA Team Stats提取JSON数据,但是现在它给了我与所示相同的错误消息以上。为了解决这些错误,我尝试输入以下标题:

Host: stats.nba.com
Connection: keep-alive
Accept: application/json
x-nba-stats-token: true
User-Agent: Chrome/79.0.3945.130
x-nba-stats-origin: stats
Referer: https://stats.nba.com/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9

当我使用上述标题提交查询时,它返回以下错误消息:

Unable to connect

We encountered an error while trying to connect.

Details: "The 'Host' header must be modified using the appropriate property or method.
Parameter name: name"

关于如何正确运行查询,我已经没有想法了。我真的是网络爬虫和HTML的新手-我一直在努力自学。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

GET请求的所有标头:

Host: stats.nba.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: application/json, text/plain, */*
x-nba-stats-token: true
X-NewRelic-ID: VQECWF5UChAHUlNTBwgBVw==
DNT: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
x-nba-stats-origin: stats
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: https://stats.nba.com/teams/traditional/?sort=TEAM_NAME&dir=-1
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US;q=0.9,en;q=0.7

URL:

https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=

必需的标题:

Accept: application/json, text/plain, */*
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
x-nba-stats-origin: stats
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: https://stats.nba.com/teams/traditional/?sort=TEAM_NAME&dir=-1

不确定是否需要

x-nba-stats-token: true
X-NewRelic-ID: VQECWF5UChAHUlNTBwgBVw==

可能的问题

  1. 您检测为机器人并被阻止

  2. 标头X-NewRelic-ID是一个令牌(可能带有超时)。可能是使用IP,User-Agent等不同的参数进行分配。
    通过对X-NewRelic-ID的GET请求,您可以在HTML响应中获得全新的https://stats.nba.com/。 这是带有xpid令牌的HTML的一部分: <script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQECWF5UChAHUlNTBwgBVw==",licenseKey:"09f0cb5c68",applicationID:"76210961"};