我正在尝试将JSON文件转换为pandas df以删除不需要的数据并限制为ID的csv,数据如下所示:
{
"data": [
{
"message": "Uneeded message",
"created_time": "2017-04-02T17:20:37+0000",
"id": "723456782912449_1008262099345654"
},
{
"message": "Uneeded message",
"created_time": "2017-03-28T06:26:28+0000",
"id": "771345678912449_1003934567871010"
},
之前我没有使用过JSON,但我用来加载这些数据的代码是
import pandas as pd
import json
with open('fileName.json', encoding="utf8" ) as f:
w = json.loads(f.read(), strict=False)
结束输出应该只是一个ID为
列的CSV答案 0 :(得分:2)
我认为你需要json_normalize
:
from pandas.io.json import json_normalize
import json
with open('file.json') as data_file:
d = json.load(data_file)
print (d)
{
"data": [{
"message": "Uneeded message",
"created_time": "2017-04-02T17:20:37+0000",
"id": "723456782912449_1008262099345654"
}, {
"message": "Uneeded message",
"created_time": "2017-03-28T06:26:28+0000",
"id": "771345678912449_1003934567871010"
}]
}
df = json_normalize(d, 'data')
print (df)
created_time id message
0 2017-04-02T17:20:37+0000 723456782912449_1008262099345654 Uneeded message
1 2017-03-28T06:26:28+0000 771345678912449_1003934567871010 Uneeded message
答案 1 :(得分:1)
使用json.loads
设置
json_str = """{
"data": [
{
"message": "Uneeded message",
"created_time": "2017-04-02T17:20:37+0000",
"id": "723456782912449_1008262099345654"
},
{
"message": "Uneeded message",
"created_time": "2017-03-28T06:26:28+0000",
"id": "771345678912449_1003934567871010"
}]}"""
解决方案
import json
import pandas as pd
pd.DataFrame(json.loads(json_str)['data'])
created_time id message
0 2017-04-02T17:20:37+0000 723456782912449_1008262099345654 Uneeded message
1 2017-03-28T06:26:28+0000 771345678912449_1003934567871010 Uneeded message
或者使用文件中的json
with open('neutraluk1.json') as f:
print(pd.DataFrame(json.load(f)['data']))
created_time id message
0 2017-04-02T17:20:37+0000 723456782912449_1008262099345654 Uneeded message
1 2017-03-28T06:26:28+0000 771345678912449_1003934567871010 Uneeded message