我在Python中有一个字典,包含多个单独的字典:
raw1 = {'Series_Date':['2017-03-10','2017-03-13','2017-03-14','2017-03-15'],'SP':[35.6,56.7,41,41],'1M':[-7.8,56,56,-3.4],'3M':[24,-31,53,5]}
raw2 = {'Series_Date':['2017-03-10','2017-03-13','2017-03-14','2017-03-15'],'SP':[35.6,56.7,41,41],'1M':[-7.8,56,56,5.4],'3M':[24,-31,53,5]}
raw3 = {'Series_Date':['2017-03-10','2017-03-13','2017-03-14','2017-03-15'],'SP':[35.6,56.7,41,41],'1M':[9.8,56,56,7.4],'3M':[24,-31,53,5]}
top_dict = {
'raw1': raw1,
'raw2': raw2,
'raw3': raw3
}
print top_dict
我想在top_dict中的单个词典中翻转列和行,使得所有值字段都转换为值列,并将日期作为行项追加。
作为示例,翻转后我的top_dict中的raw_1将如下所示:
raw_1 = {'Series_Date':['2017-03-10','2017-03-10','2017-03-10','2017-03-13','2017-03-13','2017-03-13','2017-03-14','2017-03-14','2017-03-14','2017-03-15','2017-03-15','2017-03-15'],'Value':[35.6,-7.8,24,56.7,56,-31,41,56,53,41,-3.4,5],'Desc':['SP','1M','3M','SP','1M','3M','SP','1M','3M','SP','1M','3M']}
我知道我可以在每个单独的dict上使用pandas融化但是如何在整个top_dict字典中迭代它?
答案 0 :(得分:1)
您可以使用词典理解:
def melt_pandas(sub_dict):
df = pd.DataFrame(sub_dict)
melted = pd.melt(df, id_vars='Series_Date')
return melted.sort_values('Series_Date').to_dict('list')
result = {key: melt_pandas(sub_dict)
for key, sub_dict in top_dict.items()}
然而,你实际上可以依赖于numpy,这应该比创建一个pandas DataFrame并融化它更快:
value_cols = ['SP','1M','3M']
index_col = 'Series_Date'
def melt(sub_dict, val_cols, idx_col):
vals = np.array([sub_dict[val_col] for val_col in val_cols]).T.flatten()
desc = val_cols * len(sub_dict[val_cols[0]])
date = np.repeat(sub_dict[idx_col], len(val_cols))
return {"Series_Date": date.tolist(),
"Desc": desc,
"Value": vals.tolist()}
result_dict = {key: melt(sub_dict, value_cols, index_col)
for key, sub_dict in top_dict.items()}
print(result)
{'raw2': {'Series_Date': ['2017-03-10', '2017-03-10', '2017-03-10', '2017-03-13', '2017-03-13', '2017-03-13', '2017-03-14', '2017-03-14', '2017-03-14', '2017-03-15', '2017-03-15', '2017-03-15'],
'Value': [35.6, -7.8, 24.0, 56.7, 56.0, -31.0, 41.0, 56.0, 53.0, 41.0, 5.4, 5.0],
'Desc': ['SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M']},
'raw3': {'Series_Date': ['2017-03-10', '2017-03-10', '2017-03-10', '2017-03-13', '2017-03-13', '2017-03-13', '2017-03-14', '2017-03-14', '2017-03-14', '2017-03-15', '2017-03-15', '2017-03-15'],
'Value': [35.6, 9.8, 24.0, 56.7, 56.0, -31.0, 41.0, 56.0, 53.0, 41.0, 7.4, 5.0],
'Desc': ['SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M']},
'raw1': {'Series_Date': ['2017-03-10', '2017-03-10', '2017-03-10', '2017-03-13', '2017-03-13', '2017-03-13', '2017-03-14', '2017-03-14', '2017-03-14', '2017-03-15', '2017-03-15', '2017-03-15'],
'Value': [35.6, -7.8, 24.0, 56.7, 56.0, -31.0, 41.0, 56.0, 53.0, 41.0, -3.4, 5.0],
'Desc': ['SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M']}}
与pandas解决方案10000 loops, best of 3: 57.3 µs per loop
相比,numpy解决方案的计时时间为100 loops, best of 3: 6.79 ms per loop
。
答案 1 :(得分:0)
如果您需要Series_Date
值:
top_dict = {
raw: pd.melt(
pd.DataFrame(top_dict[raw]),
id_vars='Series_Date']
).sort_values('Series_Date').to_dict('list')
for raw in top_dict
}
否则忽略上面的sort_values()
就可以了。