我有几个看起来像这样的json文件:
data = {"75575":
{"name": "Dummy name 1",
"season": "",
"ep": "",
"channel": "Dummy channel 1",
"Schedule": ["2017-05-11", "2019-04-30", "", "", "2019-08-01", "2019-08-31", "2017-05-11", "2019-04-30", "", ""]},
"115324":
{"name": "Dummy name 2",
"season": "",
"ep": "",
"channel": "Dummy channel 2",
"Schedule": ["2017-05-09", "2019-05-31", "2017-05-09", "2019-05-31", "", "", "", "", "2019-09-01", "2019-09-30"]},}
我尝试使用json_normalize(data)
,但是导致了[1 rows x 10 columns]
,因此我正在使用以下解决方法:
import pandas as pd
df = pd.DataFrame()
for k, v in data.items():
x = pd.Series(["Dummy genre",k, v.get("name"), v.get("season"), v.get("ep"),
v.get("channel"), *v.get("Schedule")],
index=("Genre", "ID", "Name", "Season", "Episode", "Channel",
"Start date 1", "End date 1", "Start date 2", "End date 2", "Start date 3", "End date 3",
"Start date 4", "End date 4", "Start date 5", "End date 5"))
df = pd.concat([df, x.to_frame().T], ignore_index=True)
json_normalize
有办法吗?我试着玩parameters,但无法绕开它。另外请注意,我必须提取5个具有相同格式的不同json文件。
我的预期输出:
Genre ID ... Start date 5 End date 5
0 Dummy genre 75575 ...
1 Dummy genre 115324 ... 2019-09-01 2019-09-30
答案 0 :(得分:2)
不确定json_normalize
,但似乎您可以使用常规的pd.DataFrame
构造函数
df = pd.DataFrame(data).T
df = df.join(pd.DataFrame(df.Schedule.tolist(), index=df.index)).drop('Schedule', 1)
然后只需使用您已有的列表重命名列即可。