我要抓取的链接是:http://stats.nba.com/stats/playerdashptshots?DateFrom=&DateTo=&GameSegment=&LastNGames=6&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PerMode=PerGame&Period=0&PlayerID=2544&Season=2017-18&SeasonSegment=&SeasonType=Playoffs&TeamID=0&VsConference=&VsDivision=
,
特别是标题为ClosestDefender10ftPlusShooting
自从我指定urlopen(data, timeout = 3)
以来,我一直收到超时错误,但是我现在对如何解决该错误并实际获取数据感到困惑。因为当我不指定要尝试获取数据的时间时,在超时之前,它显然将持续无限长的时间。
是否有一种方法可以分解我的请求并使它更易于管理,或者使用此端点是不可能的?还是在这种情况下可以使用更好的模块? 我还使用Jupyter来确定它的价值(不确定是否会有所作为)
代码如下:
import json
from urllib.request import urlopen, Request
url = 'http://stats.nba.com/stats/playerdashptshots?DateFrom=&DateTo=&GameSegment=&LastNGames=6&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PerMode=PerGame&Period=0&PlayerID=2544&Season=2017-18&SeasonSegment=&SeasonType=Playoffs&TeamID=0&VsConference=&VsDivision='
headers = {
'Cookie': 'ak_bmsc=98B4FD680504382D1BE219CE963DE520ADDE6D86FF260000CD220B5B87B91E5E~ploZrwUVSrpu3HO/7DratALkZS/cK+SOZ9zMvNJNvJ6u/dYH50zISBdr3kK2S6ifBH/zXh9Z8oBFFeq1so2FGYfl29Zob9z065l/0caXBy5CNT3gOCn3OojgRPe7j1LLDThGl7eYQju8bl+1dO24vr5r9U+YngrmtlXpUPX+IT6Z7YoJPXP9YHmx1FMCyr7FOKmTyJL7js91F1pGVKGEOE/plhHEB4P7sq3B0uRzWWWcc=; s_cc=true; s_fid=5CA44E6CD67BA096-0F573C9E17BE0B09; s_sq=%5B%5BB%5D%5D; bm_sv=682C33C8686155B97E1B1692275AF96F~HweJkyeagLOu7iHDyl4xgtUAYOpT0NW49tH2OZpG93uH9+RTvrfGREItatT/72/WL3cY/k2VeYr/tDO1feFxAvO+Xe8fzIw2JOH4A/0lRXqp709dcJb53l9AytLTOgoHaQ4UG7rjPBPyMSoFFeJ3tg==',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
}
request = Request(url, headers=headers)
with urlopen(request, timeout= 3) as f:
data = f.read().decode('utf8')
答案 0 :(得分:1)
使用urllib
发现脚本卡住了,因此我尝试使用requests
应用以下方法。现在,数据正确地通过了。
import requests
url = 'http://stats.nba.com/stats/playerdashptshots?DateFrom=&DateTo=&GameSegment=&LastNGames=6&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PerMode=PerGame&Period=0&PlayerID=2544&Season=2017-18&SeasonSegment=&SeasonType=Playoffs&TeamID=0&VsConference=&VsDivision='
res = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
for item in res.json()['resultSets']:
if item['name'] == "ClosestDefender10ftPlusShooting":
print(item['headers'])
for items in item['rowSet']:
print(items)
结果:
['PLAYER_ID', 'PLAYER_NAME_LAST_FIRST', 'SORT_ORDER', 'GP', 'G', 'CLOSE_DEF_DIST_RANGE', 'FGA_FREQUENCY', 'FGM', 'FGA', 'FG_PCT', 'EFG_PCT', 'FG2A_FREQUENCY', 'FG2M', 'FG2A', 'FG2_PCT', 'FG3A_FREQUENCY', 'FG3M', 'FG3A', 'FG3_PCT']
[2544, 'James, LeBron', 1, 6, 2, '0-2 Feet - Very Tight', 0.014, 0.0, 0.33, 0.0, 0.0, 0.007, 0.0, 0.17, 0.0, 0.007, 0.0, 0.17, 0.0]
[2544, 'James, LeBron', 2, 6, 6, '2-4 Feet - Tight', 0.144, 0.83, 3.5, 0.238, 0.31, 0.082, 0.33, 2.0, 0.167, 0.062, 0.5, 1.5, 0.333]
[2544, 'James, LeBron', 3, 6, 5, '4-6 Feet - Open', 0.192, 2.17, 4.67, 0.464, 0.571, 0.103, 1.17, 2.5, 0.467, 0.089, 1.0, 2.17, 0.462]
[2544, 'James, LeBron', 4, 6, 6, '6+ Feet - Wide Open', 0.11, 1.33, 2.67, 0.5, 0.656, 0.041, 0.5, 1.0, 0.5, 0.068, 0.83, 1.67, 0.5]