当我运行此脚本时,IDLE不会继续。通常它会给出错误。其他脚本运行正常,所以我知道它不是IDLE。我认为我的代码是正确的但也许我错过了一些东西。这不是我将从网站上抓取的所有内容,只是想先看到这项工作,而不是以后我可以完成所有工作。
import csv
import requests
import os
##HOME TEAM
req = requests.get('http://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=10%2F17%2F2017&DateTo=04%2F11%2F2018&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=Home&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=')
data = req.json()
my_data = []
pk = data['resultSets']
for item in data:
team = item.get['rowSet']
for item in team:
Team_Id = item[0]
Team_Name = item[1]
my_data.append([Team_Id, Team_Name])
headers = ["Team_Id", "Team_Name"]
with open("NBA_Home_Team.csv", "a", newline='') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(my_data)
f.close()
##os.system("taskkill /f /im pythonw.exe")
答案 0 :(得分:1)
似乎它因为服务器没有响应而挂起。可以通过终止进程并检查堆栈跟踪来验证它:
Traceback (most recent call last):
req = requests.get('http://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=10%2F17%2F2017&DateTo=04%2F11%2F2018&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=Home&MeasureType=Base&Month=0
&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsCo
nference=&VsDivision=')
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 502, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 612, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 379, in _make_request
httplib_response = conn.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1121, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 438, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 394, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 480, in readline
data = self._sock.recv(self._rbufsize) <-- we're stucking here
KeyboardInterrupt
我尝试在浏览器中打开网址并且工作正常,我在一秒钟内收到回复。然后我开始在代码中调整请求以模仿有效的浏览器。我的第一个想法是使用有效的用户代理,我立即收到了以下代码的回复:
data = requests.get(
'http://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=10%2F17%2F2017&DateTo=04%2F11%2F2018&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=Home&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=',
headers={'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_0 like Mac OS X) AppleWebKit/602.1.38 (KHTML, like Gecko) Version/10.0 Mobile/14A300 Safari/602.1'},
).json()
如果没有有效的User-Agent,也许某种针对僵尸程序的防御机制会导致无响应。
有关代码段的其他说明:
for item in data:
使用pk
代替data
。
team = item.get['rowSet']
使用item['rowSet']
或item.get('rowSet')
,但不要混用它们。 item.get
是一项功能,因此无法应用[]
。
my_data.append([Team_Id, Team_Name])
缩进应与上面的行相同