将JSON文件导入熊猫数据框

时间:2019-08-08 03:35:31

标签: python pandas

我有几个看起来像这样的json文件:

data = {"75575": 
            {"name": "Dummy name 1", 
             "season": "", 
             "ep": "", 
             "channel": "Dummy channel 1", 
             "Schedule": ["2017-05-11", "2019-04-30", "", "", "2019-08-01", "2019-08-31", "2017-05-11", "2019-04-30", "", ""]}, 
        "115324": 
            {"name": "Dummy name 2", 
             "season": "", 
             "ep": "", 
             "channel": "Dummy channel 2", 
             "Schedule": ["2017-05-09", "2019-05-31", "2017-05-09", "2019-05-31", "", "", "", "", "2019-09-01", "2019-09-30"]},}

我尝试使用json_normalize(data),但是导致了[1 rows x 10 columns],因此我正在使用以下解决方法:

import pandas as pd

df = pd.DataFrame()

for k, v in data.items():
    x = pd.Series(["Dummy genre",k, v.get("name"), v.get("season"), v.get("ep"),
                   v.get("channel"), *v.get("Schedule")],
                  index=("Genre", "ID", "Name", "Season", "Episode", "Channel",
                         "Start date 1", "End date 1", "Start date 2", "End date 2", "Start date 3", "End date 3",
                         "Start date 4", "End date 4", "Start date 5", "End date 5"))
    df = pd.concat([df, x.to_frame().T], ignore_index=True)

json_normalize有办法吗?我试着玩parameters,但无法绕开它。另外请注意,我必须提取5个具有相同格式的不同json文件。

我的预期输出:

         Genre      ID     ...     Start date 5  End date 5
0  Dummy genre   75575     ...                             
1  Dummy genre  115324     ...       2019-09-01  2019-09-30

1 个答案:

答案 0 :(得分:2)

不确定json_normalize,但似乎您可以使用常规的pd.DataFrame构造函数

df = pd.DataFrame(data).T
df = df.join(pd.DataFrame(df.Schedule.tolist(), index=df.index)).drop('Schedule', 1)

然后只需使用您已有的列表重命名列即可。