我有一个嵌套的JSON文件,我将其展平并得到了一个看起来像这样的列表;
[{patient_0_order: 1234,
patient_0_id: a1,
patient_0_time: 01/01/2016,
patient_0_desc: xyz,
patient_1_order: 2313,
patient_1_id: b1,
patient_1_time: 02/01/2016,
patient_1_desc: def,
patient_2_order: 9876,
patient_2_id: c1,
patient_2_time: 03/01/2016,
patient_2_desc: ghi,
patient_3_order: 0075,
patient_3_id: d1,
patient_3_time: 04/01/2016,
patient_3_desc: klm,
patient_4_order: 6268,
patient_4_id: e1,
patient_4_time: 05/01/2016,
patient_4_desc: pqr}`]
现在,我想将列表转换为数据框,以使每一行像下面这样容纳一名患者。
patient_order patient_id patient_time patient_desc
0 1234 a1 01/01/2016 xyz
1 2313 b1 02/01/2016 def
2 9876 c1 03/01/2016 ghi
3 0075 d1 04/01/2016 klm
4 6268 e1 05/01/2016 pqr
我尝试使用pandas.DataFrame(list)
,它给了我一个1行* 20列表的数据帧,这不是我想要的。
任何帮助和建议将不胜感激。
答案 0 :(得分:1)
'这里是如何转换json对象(字典)的方法:
old_dict = json.loads('YOUR JSON STRING')[0]
col_names = ['order', 'id', 'time', 'desc']
# Reorganize the dictionary.
new_dict = {col: {k: v for k, v in old_dict.iteritems() if col in k} for col in col_names}
df = pd.DataFrame(new_dict)
应该返回您想要的东西。
答案 1 :(得分:1)
在这里,这行得通。可能不是最漂亮的方法,但是它可以正常工作,以后我可能会再清理它。
original = [{"patient_0_order": 1234, "patient_0_id": 123, "patient_1_id": 12, "patient_1_order": 1255}]
original = original[0]
elems = []
current_patient = 0
current_d = {}
total_elems = len(original.keys())
for index, i in enumerate(sorted(original.keys(), key=lambda x: int(x.split("_")[1]))):
key_details = i.split("_")
# This will be used in the dataframe as a column name
key_name = key_details[2]
# The number specific to this patient
patient_num = int(key_details[1])
# Checking if we're still on the same patient
if patient_num == current_patient:
current_d[key_name] = original[i]
# Checks if this is the last element
if index == total_elems-1:
elems.append(current_d)
# Checks if we've moved on to the next patient and moves on accordingly
if patient_num != current_patient:
elems.append(current_d)
# Starting off the new dictionary for this patient with the current key
current_d = {key_name: original[i]}
current_patient = patient_num
df = pd.DataFrame(elems)
并随时修改key_name
方法来调整您希望列的命名方式!向其中添加'patient_'
即可解决问题。