我有来自json的嵌套字典形式的数据:-
{
"simple25b" : {
"hands" : {
"0" : {
"handId" : "xyz",
"time" : "2019-09-23 11:00:01",
"currency" : "rm"
},
"1" : {
"handId" : "abc",
"time" : "2019-09-23 11:01:18",
"currency" : "rm"
}
}
},
"simple5af" : {
"hands" : {
"0" : {
"handId" : "akg",
"time" : "2019-09-23 10:53:22",
"currency" : "rm"
},
"1" : {
"handId" : "mzc",
"time" : "2019-09-23 10:54:15",
"currency" : "rm"
},
"2" : {
"handId" : "swk",
"time" : "2019-09-23 10:56:03",
"currency" : "rm"
},
"3" : {
"handId" : "pQc",
"time" : "2019-09-23 10:57:15",
"currency" : "rm"
},
"4" : {
"handId" : "ywh",
"time" : "2019-09-23 10:58:53",
"currency" : "rm"
}
}
}
我需要将其更改为单个dataframe对象,以便产生如下结果:-
我尝试过循环,将其读取为json后将列更改为列表:-
#reading data
with open("data.json", 'r', encoding = 'utf-8-sig') as datafile:
data = json.load(datafile)
df = pd.DataFrame(data)
df1 = df.transpose()
我也尝试过:-
pd.concat([df1.drop(['hands'], axis=1), df1['hands'].apply(pd.Series)], axis=1)
但没有任何结果。
答案 0 :(得分:3)
想法是将PIDS
和Hands
键添加到最后的字典并追加到list of dict
-这样最后一个DataFrame
构造函数就可以很好地工作了:
L = []
for k, v in data.items():
for k1, v1 in v.items():
for k2, v2 in v1.items():
v2['PIDS'] = k
v2['Hands'] = k2
L.append(v2)
df = pd.DataFrame(L)
print (df)
handId time currency PIDS Hands
0 xyz 2019-09-23 11:00:01 rm simple25b 0
1 abc 2019-09-23 11:01:18 rm simple25b 1
2 akg 2019-09-23 10:53:22 rm simple5af 0
3 mzc 2019-09-23 10:54:15 rm simple5af 1
4 swk 2019-09-23 10:56:03 rm simple5af 2
5 pQc 2019-09-23 10:57:15 rm simple5af 3
6 ywh 2019-09-23 10:58:53 rm simple5af 4
更改了排序循环解决方案:
L = []
for k, v in data.items():
for k1, v1 in v.items():
for k2, v2 in v1.items():
a = {'PIDS':k, 'Hands': k2}
L.append({**a, **v2})
和列表理解选项:
L = [{**{'PIDS':k, 'Hands': k2}, **v2}
for k, v in data.items()
for k1, v1 in v.items()
for k2, v2 in v1.items()]
df = pd.DataFrame(L)
print (df)
PIDS Hands handId time currency
0 simple25b 0 xyz 2019-09-23 11:00:01 rm
1 simple25b 1 abc 2019-09-23 11:01:18 rm
2 simple5af 0 akg 2019-09-23 10:53:22 rm
3 simple5af 1 mzc 2019-09-23 10:54:15 rm
4 simple5af 2 swk 2019-09-23 10:56:03 rm
5 simple5af 3 pQc 2019-09-23 10:57:15 rm
6 simple5af 4 ywh 2019-09-23 10:58:53 rm
答案 1 :(得分:0)
您可以使用递归进行以下操作,以便在任何深度进行操作
def convert_to_df(d, col_names, depth=0):
df_list = []
for key, value in d.items():
if type(value) is dict:
df = convert_to_df(value, col_names, depth+1)
df.loc[:,col_names[depth]] = key
df_list.append(df)
else:
return pd.DataFrame([d.values()], columns=d.keys())
return pd.concat(df_list)
col_names_for_depth = ["PID","", "Hands"]
df = convert_to_df(d, col_names_for_depth)
## rearrange columns and remove colums for depth of "hands"
new_cols = list(df.columns[[-1, -3]]) + list(df.columns[:-3])
df.reindex(new_cols, axis=1)