Question

我在Python中有一个字典，包含多个单独的字典：

raw1 = {'Series_Date':['2017-03-10','2017-03-13','2017-03-14','2017-03-15'],'SP':[35.6,56.7,41,41],'1M':[-7.8,56,56,-3.4],'3M':[24,-31,53,5]}
raw2 = {'Series_Date':['2017-03-10','2017-03-13','2017-03-14','2017-03-15'],'SP':[35.6,56.7,41,41],'1M':[-7.8,56,56,5.4],'3M':[24,-31,53,5]}
raw3 = {'Series_Date':['2017-03-10','2017-03-13','2017-03-14','2017-03-15'],'SP':[35.6,56.7,41,41],'1M':[9.8,56,56,7.4],'3M':[24,-31,53,5]}
top_dict = {
  'raw1': raw1,
  'raw2': raw2,
  'raw3': raw3
  }
print top_dict

我想在top_dict中的单个词典中翻转列和行，使得所有值字段都转换为值列，并将日期作为行项追加。

作为示例，翻转后我的top_dict中的raw_1将如下所示：

 raw_1 = {'Series_Date':['2017-03-10','2017-03-10','2017-03-10','2017-03-13','2017-03-13','2017-03-13','2017-03-14','2017-03-14','2017-03-14','2017-03-15','2017-03-15','2017-03-15'],'Value':[35.6,-7.8,24,56.7,56,-31,41,56,53,41,-3.4,5],'Desc':['SP','1M','3M','SP','1M','3M','SP','1M','3M','SP','1M','3M']}

我知道我可以在每个单独的dict上使用pandas融化但是如何在整个top_dict字典中迭代它？

Answer 1

您可以使用词典理解：

def melt_pandas(sub_dict):
    df = pd.DataFrame(sub_dict)
    melted = pd.melt(df, id_vars='Series_Date')
    return melted.sort_values('Series_Date').to_dict('list')

result = {key: melt_pandas(sub_dict)
          for key, sub_dict in top_dict.items()}

然而，你实际上可以依赖于numpy，这应该比创建一个pandas DataFrame并融化它更快：

value_cols = ['SP','1M','3M']
index_col = 'Series_Date'

def melt(sub_dict, val_cols, idx_col):
    vals = np.array([sub_dict[val_col] for val_col in val_cols]).T.flatten()
    desc = val_cols * len(sub_dict[val_cols[0]])
    date = np.repeat(sub_dict[idx_col], len(val_cols))

    return {"Series_Date": date.tolist(), 
            "Desc": desc, 
            "Value": vals.tolist()}

result_dict = {key: melt(sub_dict, value_cols, index_col)
               for key, sub_dict in top_dict.items()}
print(result)

{'raw2': {'Series_Date': ['2017-03-10', '2017-03-10', '2017-03-10', '2017-03-13', '2017-03-13', '2017-03-13', '2017-03-14', '2017-03-14', '2017-03-14', '2017-03-15', '2017-03-15', '2017-03-15'], 
          'Value': [35.6, -7.8, 24.0, 56.7, 56.0, -31.0, 41.0, 56.0, 53.0, 41.0, 5.4, 5.0], 
          'Desc': ['SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M']}, 
 'raw3': {'Series_Date': ['2017-03-10', '2017-03-10', '2017-03-10', '2017-03-13', '2017-03-13', '2017-03-13', '2017-03-14', '2017-03-14', '2017-03-14', '2017-03-15', '2017-03-15', '2017-03-15'], 
          'Value': [35.6, 9.8, 24.0, 56.7, 56.0, -31.0, 41.0, 56.0, 53.0, 41.0, 7.4, 5.0], 
          'Desc': ['SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M']}, 
 'raw1': {'Series_Date': ['2017-03-10', '2017-03-10', '2017-03-10', '2017-03-13', '2017-03-13', '2017-03-13', '2017-03-14', '2017-03-14', '2017-03-14', '2017-03-15', '2017-03-15', '2017-03-15'], 
          'Value': [35.6, -7.8, 24.0, 56.7, 56.0, -31.0, 41.0, 56.0, 53.0, 41.0, -3.4, 5.0], 
          'Desc': ['SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M', 'SP', '1M', '3M']}}

与pandas解决方案10000 loops, best of 3: 57.3 µs per loop相比，numpy解决方案的计时时间为100 loops, best of 3: 6.79 ms per loop。

Answer 2

如果您需要Series_Date值：

top_dict = {
    raw: pd.melt(
        pd.DataFrame(top_dict[raw]),
        id_vars='Series_Date']
    ).sort_values('Series_Date').to_dict('list')
    for raw in top_dict
}

否则忽略上面的sort_values()就可以了。

重新排列词典

2 个答案: