在python 3.6中将不同的结构json文件转换为dataframe

时间:2017-07-15 17:43:03

标签: python dataframe

请让我知道如何将以下格式的json文件转换为数据框。

json文件数据:

{
"series_id":"STEO",
"f":"A",
"data":[["2018",5.8400705041],["2017",3.5671511014],["2016",2.3014617486],["2015",2.4989178082],["2014",2.2089452055]]
}

我试过下面的代码:

sourcePath = r'D:\source\STEO.txt'
data = pd.read_json(sourcePath, lines=True)

我需要以下json的输出:

series_id   f   date    value
STEO        A   2018    5.840070504
STEO        A   2017    3.567151101
STEO        A   2016    2.301461749
STEO        A   2015    2.498917808
STEO        A   2014    2.208945206

2 个答案:

答案 0 :(得分:1)

一种方式可能如下:

读Json:

x86_64

输出:

import pandas as pd
df = pd.read_json('input.txt')
print(df)

分裂

                   data  f series_id
0  [2018, 5.8400705041]  A      STEO
1  [2017, 3.5671511014]  A      STEO
2  [2016, 2.3014617486]  A      STEO
3  [2015, 2.4989178082]  A      STEO
4  [2014, 2.2089452055]  A      STEO

输出:

# splitting into multiple columns for list
# https://stackoverflow.com/a/35491399/5916727
df[['Date','Value']] = pd.DataFrame([item for item in df.data])
# removing initial data column now
del df['data']
print(df)

答案 1 :(得分:1)

您可以使用read_json,然后使用pop删除列data并按DataFrame构造函数创建新列,转换为values

df = pd.read_json('file.json')
df[['date','value']] = pd.DataFrame(df.pop('data').values.tolist())
#if necessary convert to int
df['date'] = df['date'].astype(int)
print (df)
   f series_id  date     value
0  A      STEO  2018  5.840071
1  A      STEO  2017  3.567151
2  A      STEO  2016  2.301462
3  A      STEO  2015  2.498918
4  A      STEO  2014  2.208945

另一种解决方案:

您可以使用json_normalize,然后使用rename列,并在必要时按reindex_axis重新排序:

from pandas.io.json import json_normalize 
import json

with open('file.json') as data_file:    
    d = json.load(data_file)  

d_cols = {0:'date', 1:'value'}
names_cols = ['series_id','f','date','value']
df = json_normalize(d, 'data', ['f', 'series_id']) \
       .rename(columns=d_cols) \
       .reindex_axis(names_cols, axis=1)
df['date'] = df['date'].astype(int)
print (df)
  series_id  f  date     value
0      STEO  A  2018  5.840071
1      STEO  A  2017  3.567151
2      STEO  A  2016  2.301462
3      STEO  A  2015  2.498918
4      STEO  A  2014  2.208945