我有一个非常大的 json 文件,我想将其转换为具有所需结构的数据框,稍后将在问题中解释。
示例 json 的一些记录如下所示:
JsonRecords = {
'rec1':
{
'words':[ ['A', 'B', 'C', '.'],
['D', 'E', 'F','.']],
'Ids':[ [0, 1],
[2, 3]],
'unique':[1, 1, 1, 0, 0, 1],
'ments': {
"(0, 1)":{
"A1": [0],
"A2": [0,1],
"A3": [1],
"A4": [1,0],
"A5": [0]
},
"(2, 3)": {
"A1": [0],
"A2": [0],
"A3": [1],
"A5": [0]
}
}
},
'rec2':
{
'words':[ ['We', 'us', 'them', '.'],
['is', 'it', 'us''.' ]],
'Ids':[ [4, 5],
[6, 7]],
'unique':[0, 0, 0, 1, 1, 0],
"ments": {
"(4, 5)": {
"A1": [0],
"A2": [0],
"A3": [0],
"A4": [0]
},
"(6, 7)": {
"A1": [0],
"A2": [0],
"A4": [0,0],
"A6": [0,1]
}
}
},
'rec3':
..... more records
}
我使用以下代码解析了 json 示例:
import pandas as pd
#import json
all_data = []
for k, v in JsonRecords.items():
words, Ids, unique, ments = v['words'], v['Ids'], v['unique'], v['ments']
for t, val, m in zip(words, Ids, ments.items()):
all_data.append({
'records': k,
'words': ' '.join(t),
'Ids': val,
'unique': unique,
'ments': m
})
#print(all_data)
df = pd.DataFrame(all_data)
df.to_csv('myData.csv', encoding='utf-8')
print(df.head())
当我运行代码时,我得到以下数据帧结构:
records words Ids unique ments
rec1 A, B, C. [0, 1] [1, 1, 1, 0, 0, 1] ('(0, 1)', {'A1': [0], 'A2': [0, 1], 'A3': [1], 'A4': [1, 0], 'A5': [0]})
rec1 D, E, F. [2, 3] [1, 1, 1, 0, 0, 1] ('(2, 3)', {'A1': [0], 'A2': [0], 'A3': [1], 'A5': [0]})
rec2 We, us, them. [4, 5] [0, 0, 0, 1, 1, 0] ('(4, 5)', {'A1': [0], 'A2': [0], 'A3': [0], 'A4': [0]})
rec2 is, it, us. [6, 7] [0, 0, 0, 1, 1, 0] ('(6, 7)', {'A1': [0], 'A2': [0], 'A4': [0, 0], 'A6': [0, 1]})
rec3
如上所示,我无法根据 'Ids' 和 'words' 列进一步解析 'ments' 字典,这也应该通过解析 'ments' 字典及其嵌套值来重复。
我想要的这个嵌套 json 的数据帧结构如下所示。
Records words Ids unique ments A1 A2 A3 A4 A5 A6
rec1 A, B, C. [0, 1] [1, 1, 1, 0, 0, 1] [0, 1] 0 0 1 1 0
rec1 A, B, C. [0, 1] [1, 1, 1, 0, 0, 1] [0, 1] 1 0
rec1 D, E, F. [2, 3] [1, 1, 1, 0, 0, 1] [2, 3] 0 0 1 0
rec1 D, E, F. [2, 3] [1, 1, 1, 0, 0, 1] [2, 3]
rec2 We, us, them. [4, 5] [0, 0, 0, 1, 1, 0] [4, 5] 0 0 0 0
rec2 We, us, them. [4, 5] [0, 0, 0, 1, 1, 0] [4, 5]
rec2 is, it, us. [6, 7] [0, 0, 0, 1, 1, 0] [6, 7] 0 0 0 0
rec2 is, it, us. [6, 7] [0, 0, 0, 1, 1, 0] [6, 7] 0 1
rec3
....... more records
我会感谢一些帮助..
答案 0 :(得分:0)
使用 apply 和 json_normalize
def getMents(value):
return value[0]
def getJson(value):
return value[1]
df = pd.DataFrame(all_data)
df['json'] = df['ments'].apply(getJson)
jsonData = pd.json_normalize(df['json'])
df['ments'] = df['ments'].apply(getMents)
for col in jsonData.columns.values:
df[col] = jsonData[col]
new_df = df[0:0]
results= df[0:0]
for index,row in df.iterrows():
maxCount = 0
for col in jsonData.columns.values:
if isinstance(row[col],list):
maxCount = max(maxCount,len(row[col]))
for i in range(0,maxCount):
count = len(new_df)
new_df.loc[count] = row
for col in jsonData.columns.values:
if isinstance(new_df[col][i],list):
try:
new_df.loc[i,col]= new_df[col][i][i]
except IndexError:
new_df.loc[i,col]=None
results = pd.concat([results,new_df])
new_df = df[0:0]
results