我是python的新手所以任何人都可以帮助我吗?
我在json文件中有以下内容(即。file12.json)
{
"TimeSeries": {
"Row": [
{
"CLOSE": 41.85,
"TIMESTAMP": "2016-09-22T00:00:00+00:00"
},
{
"CLOSE": 41.37,
"TIMESTAMP": "2016-09-23T00:00:00+00:00"
},
{
"CLOSE": 40.88,
"TIMESTAMP": "2016-09-26T00:00:00+00:00"
},
{
"CLOSE": 40.98,
"TIMESTAMP": "2016-09-27T00:00:00+00:00"
},
{
"CLOSE": 44.33,
"TIMESTAMP": "2016-12-21T00:00:00+00:00"
}
]
}
}
我正在尝试创建一个结构化的Dataframe,如下所示:
CLOSE TIMESTAMP
0 41.85 2016-09-22T00:00:00+00:00
1 41.37 2016-09-23T00:00:00+00:00
2 40.88 2016-09-26T00:00:00+00:00
3 40.98 2016-09-27T00:00:00+00:00
如果我想用csv做同样的事情,我只需使用'read_csv'但read_python会产生不同的输出。
我用过这段代码......
file = pd.read_json('file12.json')
print file
...但格式并不是我想要的。我得到以下内容:
TimeSeries
Row [{u'CLOSE': 41.85, u'TIMESTAMP': u'2016-09-22T...
..即。一切都只是在一行,而不是在格式化的表格中。
谁能帮助我吗?请: - )答案 0 :(得分:3)
McKinney的 Python for Data Analysis ,他说
如何将JSON对象或对象列表转换为DataFrame或其他一些数据结构进行分析将取决于您。
试试这个(这个未经测试的代码,ymmv)
import json
import pandas as pd
with open('file12.json') as json_data:
obj = json.load(json_data)
frame = pd.DataFrame(obj['TimeSeries']['Row'], columns=['CLOSE', 'TIMESTAMP'])
答案 1 :(得分:2)
rows
字符串的json
值部分:
In [454]: txt1="""[
...: {
...: "CLOSE": 41.85,
...: "TIMESTAMP": "2016-09-22T00:00:00+00:00"
...: },
...: {
...: "CLOSE": 41.37,
...: "TIMESTAMP": "2016-09-23T00:00:00+00:00"
...: },
...: {
...: "CLOSE": 40.88,
...: "TIMESTAMP": "2016-09-26T00:00:00+00:00"
...: },
...: {
...: "CLOSE": 40.98,
...: "TIMESTAMP": "2016-09-27T00:00:00+00:00"
...: },
...: {
...: "CLOSE": 44.33,
...: "TIMESTAMP": "2016-12-21T00:00:00+00:00"
...: }
...: ]"""
解析列表:
In [449]: json.loads(txt1)
Out[449]:
[{'CLOSE': 41.85, 'TIMESTAMP': '2016-09-22T00:00:00+00:00'},
{'CLOSE': 41.37, 'TIMESTAMP': '2016-09-23T00:00:00+00:00'},
{'CLOSE': 40.88, 'TIMESTAMP': '2016-09-26T00:00:00+00:00'},
{'CLOSE': 40.98, 'TIMESTAMP': '2016-09-27T00:00:00+00:00'},
{'CLOSE': 44.33, 'TIMESTAMP': '2016-12-21T00:00:00+00:00'}]
并加载到pandas中(将日期解释为datetime64
类型,convert_dates=True
默认值):
In [451]: df=pd.read_json(txt1)
In [452]: df
Out[452]:
CLOSE TIMESTAMP
0 41.85 2016-09-22
1 41.37 2016-09-23
2 40.88 2016-09-26
3 40.98 2016-09-27
4 44.33 2016-12-21
In [453]: df.dtypes
Out[453]:
CLOSE float64
TIMESTAMP datetime64[ns]
dtype: object
但正如@Alex
所示,您可以通过首先使用json.loads
解析然后加载该字典的一部分来更好地控制转换。 obj['TimeSeries']['Row']
就是这个列表。
你甚至可以进行json
往返去除外层:
In [455]: dd = json.loads(txt)
In [456]: dd
Out[456]:
{'TimeSeries': {'Row': [{'CLOSE': 41.85,
'TIMESTAMP': '2016-09-22T00:00:00+00:00'},
{'CLOSE': 41.37, 'TIMESTAMP': '2016-09-23T00:00:00+00:00'},
{'CLOSE': 40.88, 'TIMESTAMP': '2016-09-26T00:00:00+00:00'},
{'CLOSE': 40.98, 'TIMESTAMP': '2016-09-27T00:00:00+00:00'},
{'CLOSE': 44.33, 'TIMESTAMP': '2016-12-21T00:00:00+00:00'}]}}
In [457]: pd.read_json(json.dumps(dd['TimeSeries']['Row']))
Out[457]:
CLOSE TIMESTAMP
0 41.85 2016-09-22
1 41.37 2016-09-23
2 40.88 2016-09-26
3 40.98 2016-09-27
4 44.33 2016-12-21