Question

我的时间序列数据框类似于：

ts = pd.DataFrame([['Jan 2000','WidgetCo',0.5, 2], ['Jan 2000','GadgetCo',0.3, 3], ['Jan 2000','SnazzyCo',0.2, 4],
          ['Feb 2000','WidgetCo',0.4, 2], ['Feb 2000','GadgetCo',0.5, 2.5], ['Feb 2000','SnazzyCo',0.1, 4],
          ], columns=['month','company','share','price'])

看起来像：

  month   company  share  price
0  Jan 2000  WidgetCo    0.5    2.0
1  Jan 2000  GadgetCo    0.3    3.0
2  Jan 2000  SnazzyCo    0.2    4.0
3  Feb 2000  WidgetCo    0.4    2.0
4  Feb 2000  GadgetCo    0.5    2.5
5  Feb 2000  SnazzyCo    0.1    4.0

我可以这样调整这个表：

pd.pivot_table(ts,index='month', columns='company')

哪个让我：

            share                      price                  
company  GadgetCo SnazzyCo WidgetCo GadgetCo SnazzyCo WidgetCo
month                                                         
Feb 2000      0.5      0.1      0.4      2.5        4        2
Jan 2000      0.3      0.2      0.5      3.0        4        2

这就是我想要的，除了我需要折叠MultiIndex，以便将company用作share和price的前缀，如下所示：

          WidgetCo_share  WidgetCo_price  GadgetCo_share  GadgetCo_price   ...
month                                                                      
Jan 2000             0.5               2             0.3             3.0   
Feb 2000             0.4               2             0.5             2.5

我想出了这个功能，但这似乎是一个糟糕的解决方案：

def pivot_table_to_flat(df, column, index):
    res = df.set_index(index)
    cols = res.drop(column, axis=1).columns.values
    resulting_cols = []
    for prefix in res[column].unique():
        for col in cols:
            new_col_name = prefix + '_' + col
            res[new_col_name] = res[res[column] == prefix][col]
            resulting_cols.append(new_col_name)

    return res[resulting_cols]

pivot_table_to_flat(ts, index='month', column='company')

有什么更好的方法来完成一个支点，从而产生一个带有前缀而不是MultiIndex的列？

Answer 1

这似乎更简单：

df.columns = [' '.join(col).strip() for col in df.columns.values]

带有多索引列的df并使列标签变平，df保持不变。

（参考：@ andy-haden Python Pandas - How to flatten a hierarchical index in columns）

Answer 2

我明白了。使用MultiIndex上的数据可以获得非常干净的解决方案：

def flatten_multi_index(df):
    mi = df.columns
    suffixes, prefixes = mi.levels
    col_names = [prefixes[i_p] + '_' + suffixes[i_s] for (i_s, i_p) in zip(*mi.labels)]
    df.columns = col_names
    return df

flatten_multi_index(pd.pivot_table(ts,index='month', columns='company'))

上述版本仅处理2D MultiIndex，但如果需要，可以进行推广。

Answer 3

更新（截至2017年初和熊猫0.19.2）。您可以在.values上使用MultiIndex。因此，此代码段应该为需要帮助的人展平MultiIndex。该片段既聪明又不够聪明：它可以处理来自DataFrame的行索引或列名称，但如果getattr(df,way)的结果未嵌套（即{{1} }}）。

MultiIndex

将pandas数据帧转换为前缀cols，而不是MultiIndex

3 个答案: