JSON文件到Pandas df

时间:2017-04-05 10:26:05

标签: python json pandas dataframe

我正在尝试将JSON文件转换为pandas df以删除不需要的数据并限制为ID的csv,数据如下所示:

{
     "data": [
    {
      "message": "Uneeded message",
      "created_time": "2017-04-02T17:20:37+0000",
      "id": "723456782912449_1008262099345654"
    },
    {
      "message": "Uneeded message",
      "created_time": "2017-03-28T06:26:28+0000",
      "id": "771345678912449_1003934567871010"
    },

之前我没有使用过JSON,但我用来加载这些数据的代码是

import pandas as pd
import json

with open('fileName.json', encoding="utf8" ) as f:
    w = json.loads(f.read(), strict=False)

结束输出应该只是一个ID为

列的CSV

2 个答案:

答案 0 :(得分:2)

我认为你需要json_normalize

from pandas.io.json import json_normalize 
import json

with open('file.json') as data_file:    
    d = json.load(data_file)

print (d)
{
    "data": [{
        "message": "Uneeded message",
        "created_time": "2017-04-02T17:20:37+0000",
        "id": "723456782912449_1008262099345654"
    }, {
        "message": "Uneeded message",
        "created_time": "2017-03-28T06:26:28+0000",
        "id": "771345678912449_1003934567871010"
    }]
}

df = json_normalize(d, 'data')
print (df)
               created_time                                id          message
0  2017-04-02T17:20:37+0000  723456782912449_1008262099345654  Uneeded message
1  2017-03-28T06:26:28+0000  771345678912449_1003934567871010  Uneeded message

答案 1 :(得分:1)

使用json.loads

设置

json_str = """{
 "data": [
        {
          "message": "Uneeded message",
          "created_time": "2017-04-02T17:20:37+0000",
          "id": "723456782912449_1008262099345654"
        },
        {
          "message": "Uneeded message",
          "created_time": "2017-03-28T06:26:28+0000",
          "id": "771345678912449_1003934567871010"
        }]}"""

解决方案

import json
import pandas as pd

pd.DataFrame(json.loads(json_str)['data'])

               created_time                                id          message
0  2017-04-02T17:20:37+0000  723456782912449_1008262099345654  Uneeded message
1  2017-03-28T06:26:28+0000  771345678912449_1003934567871010  Uneeded message

或者使用文件中的json

with open('neutraluk1.json') as f:
    print(pd.DataFrame(json.load(f)['data']))

               created_time                                id          message
0  2017-04-02T17:20:37+0000  723456782912449_1008262099345654  Uneeded message
1  2017-03-28T06:26:28+0000  771345678912449_1003934567871010  Uneeded message