我有一个JSON,我将其转换为字典并尝试使用该字典制作数据框。问题是它是多个嵌套的,并且数据不一致
例如
transaction_no is_delivered flow transaction_date receiver_client warehouse kits quantity product1 quantity1 product2 quantity2 product3 quantity3
1180 False 36 2020-08-13T04:34:11.678000Z Lumax Cornaglia Auto Tech Private Limited Yantraksh Logistics Private limited_GGNPC1 KIT1182A 5 PP001 5 FSS001 18 NaN NaN
1180 False 36 2020-08-13T04:34:11.678000Z Lumax Cornaglia Auto Tech Private Limited Yantraksh Logistics Private limited_GGNPC1 KIT1182B 7 PP001 5 PS001 5 PL001 7.0
我想将其转换为如下所示的数据框:
data = json.loads(d)
result_dataframe = pd.DataFrame(data)
l = ['transaction_no', 'is_delivered','flow', 'transaction_date', 'receiver_client', 'warehouse','kits'] #fields that I need
result_dataframe = result_dataframe[l]
result_dataframe.to_csv("out.csv")
或以更好的方式显示它:
我所做的:
def flatten(input_dict, separator='_', prefix=''):
output_dict = {}
for key, value in input_dict.items():
if isinstance(value, dict) and value:
deeper = flatten(value, separator, prefix+key+separator)
output_dict.update({key2: val2 for key2, val2 in deeper.items()})
elif isinstance(value, list) and value:
for index, sublist in enumerate(value, start=1):
if isinstance(sublist, dict) and sublist:
deeper = flatten(sublist, separator, prefix+key+separator+str(index)+separator)
output_dict.update({key2: val2 for key2, val2 in deeper.items()})
else:
output_dict[prefix+key+separator+str(index)] = value
else:
output_dict[prefix+key] = value
return output_dict
我尝试过:
file = "full_convo1_wav_channel_customer.wav"
sr,data = wf.read(file)
s = io.BytesIO(data)
AudioSegment.from_file(s,sample_width=2, frame_rate=sr, channels=1)
TypeError: object of type '_io.BytesIO' has no len()
但是它在一行中给出了所有值,如何基于工具包将它们分开并得到结果?
答案 0 :(得分:0)
熊猫json_normalize功能应该是您想要的。
以下是规范您的Json输入的方法:
pd.json_normalize(data, ['kits', 'items'],
[['kits', 'kit'], 'transaction_no', 'is_delivered','flow', 'transaction_date', 'receiver_client', 'warehouse'],
errors='ignore', record_prefix='kits.')
根据产品按列划分,您应该尝试制作数据透视表。
祝你好运。