我正在尝试从NHL.com提取统计表并将它们转换为csv以便以后在excel中使用。我可以拉表,但是在将它们转换为csv时遇到问题。我发现了很多关于将json转换为csv的问题,但没有一个解决方案对我有用。一些解决方案使用了pandas,由于某种原因,它不断给我一个追溯错误。以下是转换为csv之前的代码。
import requests
import lxml.html
from pprint import pprint
from sys import exit
import json
import csv
import datetime
import dateutil.relativedelta
now = datetime.datetime.now()
one_month_ago = now + dateutil.relativedelta.relativedelta(months=-15)
today_date = now.strftime('%Y-%m-%d')
one_month_ago_date = one_month_ago.strftime('%Y-%m-%d')
url = 'http://www.nhl.com/stats/rest/individual/skaters/basic/game/skatersummary?cayenneExp=gameDate%3E=%22'+one_month_ago_date+'T04:00:00.000Z%22%20and%20gameDate%3C=%22'+today_date+'T03:59:59.999Z%22%20and%20gameLocationCode=%22H%22%20and%20gameTypeId=%222%22&factCayenneExp=shots%3E=1&sort=[{%22property%22:%22points%22,%22direction%22:%22DESC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]'
resp = requests.get(url).text
resp = json.loads(resp)
非常感谢任何帮助!
编辑: 我尝试过的一些csv转换方法包括来自How can I convert JSON to CSV?的评分最高的答案。 我在这里粘贴和格式化问题所以我只是提供了链接。
这是我尝试使用pandas时的输出。
Traceback (most recent call last):
File "NHL Data Scrape.py", line 1, in <module>
from pandas.io.json import json_normalize
File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\pandas\__init__.py", line 13, in <module>
__import__(dependency)
File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\__init__.py", line 142, in <module>
from . import add_newdocs
File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\add_newdocs.py", line 13, in <module>
from numpy.lib import add_newdoc
File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\lib\__init__.py", line 8, in <module>
from .type_check import *
File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\lib\type_check.py", line 11, in <module>
import numpy.core.numeric as _nx
File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\core\__init__.py", line 35, in <module>
from . import _internal # for freeze programs
File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\core\_internal.py", line 18, in <module>
from .numerictypes import object_
File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\core\numerictypes.py", line 962, in <module>
_register_types()
File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\core\numerictypes.py", line 958, in _register_types
numbers.Integral.register(integer)
AttributeError: module 'numbers' has no attribute 'Integral'
------------------
(program exited with code: 1)
Press any key to continue . . .
答案 0 :(得分:0)
您可以使用json_normalize()
中的pandas.io.json
,例如:
In []:
from pandas.io.json import json_normalize
...
resp = requests.get(url).json()
json_normalize(resp, 'data')
Out[]:
assists faceoffWinPctg gameWinningGoals gamesPlayed goals otGoals ...
0 31 0.0967 2 41 20 1 ...
1 27 0.0000 3 38 22 0 ...
2 35 0.5249 4 41 14 2 ...
3 34 0.4866 3 41 14 1 ...
...
答案 1 :(得分:0)
您可以使用python的内置csv.DictWriter
resp = requests.get(url).json() # get response data in json
# resp['data'] is a list of dicts which contains players info.
# resp['data'][0].keys() is a dictionary keys. We'll use it for csv header.
with open('nhl_players.csv', 'w') as f:
w = csv.DictWriter(f, resp['data'][0].keys())
w.writeheader()
w.writerows(resp['data'])
此处输出CSV文件https://www.dropbox.com/s/1mmprenx0eniflg/nhl_players.csv?dl=0
希望这有帮助。