Question

我有一个多索引数据框，其中索引中有两个日期。每个日期组合都有A，B，C，D列的值。

    tradedate          deliverydate            A        B        C      D
2017-09-15 00:00:00    2017-09-11 00:00:00     31.84    27.61   32.3    46.57
                       2017-09-18 00:00:00     39       41.33   42.13   51.655
                       2017-09-25 00:00:00     39.75    40.5    42.89   56.135
                       2017-10-02 00:00:00     41.25    37.85   43.375  54.725
                       2017-10-09 00:00:00     46       40.72   47.875  54.475
2017-09-18 00:00:00    2017-09-11 00:00:00     32.04    28.94   34.18   49.295
                       2017-09-18 00:00:00     40.2     41.615  42.945  50.71
                       2017-09-25 00:00:00     40       39.55   41.815  54.125
                       2017-10-02 00:00:00     41.75    37.265  43.99   52.975
                       2017-10-09 00:00:00     44.75    40.615  48.5    54.285
                       2017-10-16 00:00:00     51.12    42.875  52.625  54.475

我想通过用位置替换级别2中的交付日期，然后使用列名和位置创建列来解决多索引。

职位看起来像这样：

    tradedate         position     A     B      C       D
2017-09-15 00:00:00    0          31.84  27.61  32.3    46.57
                       1          39     41.33  42.13   51.655
                       2          39.75  40.5   42.89   56.135
                       3          41.25  37.85  43.375  54.725
                       4          46     40.72  47.875  54.475
2017-09-18 00:00:00    0          32.04  28.94  34.18   49.295
                       1          40.2   41.615 42.945  50.71
                       2          40     39.55  41.815  54.125
                       3          41.75  37.265 43.99   52.975
                       4          44.75  40.615 48.5    54.285
                       5          51.12  42.875 52.625  54.475

最后的数据框应该没有多索引，并且看起来像这样：

    tradedate        A_0    A_1    A_2      A_3     A_4     A_5     B_0     …   D_4      D_5
2017-09-15 00:00:00 31.84   39     39.75    41.25   46       -      27.61   …   54.475  
2017-09-18 00:00:00 32.04   40.2   40       41.75   44.75   51.12   28.94   …   54.285  54.475

有人可以帮助我进行这些转换吗？

Answer 1

这可以做到：

new_df = (df.reset_index(level=1, drop=True)
   .set_index(df.groupby(level=0).cumcount(), append=True) # this is your step 1
   .unstack(level=1)
)

# rename columns
new_df.columns = [f'{x}_{y}' for x,y in new_df.columns]

# reset_index
new_df = new_df.reset_index()

样本数据：

df = (pd.DataFrame({'a':['x']*4+['y']*3,
                  'b':[8,8,8,9,7,7,7],
                  'A':[1,2,3,4,5,6,7],
                  'B':[7,6,5,4,3,2,1]})
        .set_index(['a','b'])
     )

输出：

   a  A_0  A_1  A_2  A_3  B_0  B_1  B_2  B_3
0  x  1.0  2.0  3.0  4.0  7.0  6.0  5.0  4.0
1  y  5.0  6.0  7.0  NaN  3.0  2.0  1.0  NaN

转换Multindex数据框并按位置更改第二级索引

1 个答案: