我对Python相当陌生。我正在尝试通过https://stats.nba.com/players/drives/
抓取NBA Drives数据我使用Chrome Devtools查找API URL。然后,我使用了请求包来获取JSON字符串。
原始代码:
import requests
headers = {"User-Agent": "Mozilla/5.0..."}
url = " https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=Drives&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
r = requests.get(url, headers = headers)
d = r.json()
但是,这不再起作用。由于某些原因,下面的URL链接请求在NBA服务器上超时。因此,我需要找到一种获取此信息的新方法。
我正在浏览Chrome devtools,发现所需的JSON字符串存储在“网络XHR响应”选项卡中。有什么办法可以将其抓取到python中。请参见下图。
答案 0 :(得分:2)
我测试了带有其他标头的url(我在DevTool
中看到了此请求),看来它需要标头Referer
才能正常工作
import requests
headers = {
'User-Agent': 'Mozilla/5.0',
#'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0',
'Referer': 'https://stats.nba.com/players/drives/',
#'Accept': 'application/json, text/plain, */*',
}
url = 'https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=Drives&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight='
r = requests.get(url, headers=headers)
data = r.json()
print(data)
顺便说一句:相同,但具有作为字典的参数,因此更容易设置不同的值
import requests
headers = {
'User-Agent': 'Mozilla/5.0',
#'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0',
'Referer': 'https://stats.nba.com/players/drives/',
#'Accept': 'application/json, text/plain, */*',
}
url = 'https://stats.nba.com/stats/leaguedashptstats'
params = {
'College': '',
'Conference': '',
'Country': '',
'DateFrom': '',
'DateTo': '',
'Division': '',
'DraftPick': '',
'DraftYear': '',
'GameScope': '',
'Height': '',
'LastNGames': '0',
'LeagueID': '00',
'Location': '',
'Month': '0',
'OpponentTeamID': '0',
'Outcome': '',
'PORound': '0',
'PerMode': 'PerGame',
'PlayerExperience': '',
'PlayerOrTeam': 'Player',
'PlayerPosition': '',
'PtMeasureType': 'Drives',
'Season': '2019-20',
'SeasonSegment': '',
'SeasonType': 'Regular Season',
'StarterBench': '',
'TeamID': '0',
'VsConference': '',
'VsDivision': '',
'Weight': '',
}
r = requests.get(url, headers=headers, params=params)
#print(r.request.url)
data = r.json()
print(data)