我有一个字典列表如下:
[
{
"medication_name": "Victoza",
"medication_id": 68,
"manufacturer_name": "Novo Nordisk",
"practice_id": 1,
"disease_id": 16,
"practice_state": "MA",
"disease_name": "Type II Diabetes",
"practice_name": "Cambridge Hospital Inc"
},
{
"medication_name": "Opsumit",
"medication_id": 39,
"manufacturer_name": "Actelion",
"practice_id": 1,
"disease_id": 12,
"practice_state": "MA",
"disease_name": "Pulmonary Arterial Hypertension",
"practice_name": "Cambridge Hospital Inc"
},
{
"medication_name": "ITCA-650",
"medication_id": 29,
"manufacturer_name": "Intarcia",
"practice_id": 1,
"disease_id": 16,
"practice_state": "MA",
"disease_name": "Type II Diabetes",
"practice_name": "Cambridge Hospital Inc"
},
{
"medication_name": "Flolan",
"medication_id": 22,
"manufacturer_name": "GlaxoSmithKline",
"practice_id": 1,
"disease_id": 12,
"practice_state": "CA",
"disease_name": "Pulmonary Arterial Hypertension",
"practice_name": "Cambridge Hospital Inc"
},
{
"medication_name": "Adcirca",
"medication_id": 4,
"manufacturer_name": "United Therapeutics",
"practice_id": 1,
"disease_id": 12,
"practice_state": "CA",
"disease_name": "Pulmonary Arterial Hypertension",
"practice_name": "Cambridge Hospital Inc"
},
.....
.....
.....
]
这是一个很长的列表,并且为了便于阅读而被截断。列表中有很多重复的条目。我需要的是找到每个键的唯一值,并用以下数据格式表示:
{
medication : [ {medication_id : 1, medication_name: "Victoza"}, {medication_id :2, medication_name:"ITCA-650"},....]
practice : [ {practice_id : 1, practice_name: "Cambridge"}, {practice_id : 2, practice_name: "Oxford"},...]
disease : [ {disease_id: 1, disease_name: "Diabetes"}, {disease_id: 2, disease_name: "Obseity"},...]
manufacturer : [{name: "Cipla"}, {name: "Phizer"},...]
state : [{name:"MA"},{name:"CA"},...]
}
最好的方法是什么?
答案 0 :(得分:2)
使用pandas,假设data
是您显示的词典列表
import pandas as pd
df = pd.DataFrame.from_records(data)
# In [38]: df
# Out[38]:
# disease_id disease_name manufacturer_name medication_id medication_name practice_id practice_name practice_state
# 0 16 Type II Diabetes Novo Nordisk 68 Victoza 1 Cambridge Hospital Inc MA
# 1 12 Pulmonary Arterial Hypertension Actelion 39 Opsumit 1 Cambridge Hospital Inc MA
# 2 16 Type II Diabetes Intarcia 29 ITCA-650 1 Cambridge Hospital Inc MA
# 3 12 Pulmonary Arterial Hypertension GlaxoSmithKline 22 Flolan 1 Cambridge Hospital Inc CA
# 4 12 Pulmonary Arterial Hypertension United Therapeutics 4 Adcirca 1 Cambridge Hospital Inc CA
res = {}
res['medication'] = df[['medication_id', 'medication_name']].to_dict(orient='records')
# In [49]: res
# Out[49]:
# {
# 'medication': [
# {'medication_id': 68, 'medication_name': 'Victoza'},
# {'medication_id': 39, 'medication_name': 'Opsumit'},
# {'medication_id': 29, 'medication_name': 'ITCA-650'},
# {'medication_id': 22, 'medication_name': 'Flolan'},
# {'medication_id': 4, 'medication_name': 'Adcirca'}]
# }
你明白这一点,并以“练习”,“疾病”等方式完成其余的工作。
答案 1 :(得分:1)
final = {
'medication': [],
'practice': [],
'disease': [],
'manufacturer': [],
'state': [],
}
for d in orig_list:
medication = dict((k, d[k]) for k in ('medication_id', 'medication_name'))
practice = dict((k, d[k]) for k in ('practice_id', 'practice_name'))
disease = dict((k, d[k]) for k in ('disease_id', 'disease_name'))
manufacturer = dict(name=d['manufacturer_name'])
state = dict(name=d['practice_state'])
if medication not in final['medication']: final['medication'].append(medication)
if practice not in final['practice']: final['practice'].append(practice)
if disease not in final['disease']: final['disease'].append(disease)
if manufacturer not in final['manufacturer']: final['manufacturer'].append(manufacturer)
if state not in final['state']: final['state'].append(state)
如果您不经常这样做,我建议您这样做。