我有一个多索引数据框,其中索引中有两个日期。每个日期组合都有A,B,C,D列的值。
tradedate deliverydate A B C D
2017-09-15 00:00:00 2017-09-11 00:00:00 31.84 27.61 32.3 46.57
2017-09-18 00:00:00 39 41.33 42.13 51.655
2017-09-25 00:00:00 39.75 40.5 42.89 56.135
2017-10-02 00:00:00 41.25 37.85 43.375 54.725
2017-10-09 00:00:00 46 40.72 47.875 54.475
2017-09-18 00:00:00 2017-09-11 00:00:00 32.04 28.94 34.18 49.295
2017-09-18 00:00:00 40.2 41.615 42.945 50.71
2017-09-25 00:00:00 40 39.55 41.815 54.125
2017-10-02 00:00:00 41.75 37.265 43.99 52.975
2017-10-09 00:00:00 44.75 40.615 48.5 54.285
2017-10-16 00:00:00 51.12 42.875 52.625 54.475
我想通过用位置替换级别2中的交付日期,然后使用列名和位置创建列来解决多索引。
职位看起来像这样:
tradedate position A B C D
2017-09-15 00:00:00 0 31.84 27.61 32.3 46.57
1 39 41.33 42.13 51.655
2 39.75 40.5 42.89 56.135
3 41.25 37.85 43.375 54.725
4 46 40.72 47.875 54.475
2017-09-18 00:00:00 0 32.04 28.94 34.18 49.295
1 40.2 41.615 42.945 50.71
2 40 39.55 41.815 54.125
3 41.75 37.265 43.99 52.975
4 44.75 40.615 48.5 54.285
5 51.12 42.875 52.625 54.475
最后的数据框应该没有多索引,并且看起来像这样:
tradedate A_0 A_1 A_2 A_3 A_4 A_5 B_0 … D_4 D_5
2017-09-15 00:00:00 31.84 39 39.75 41.25 46 - 27.61 … 54.475
2017-09-18 00:00:00 32.04 40.2 40 41.75 44.75 51.12 28.94 … 54.285 54.475
有人可以帮助我进行这些转换吗?
答案 0 :(得分:1)
这可以做到:
new_df = (df.reset_index(level=1, drop=True)
.set_index(df.groupby(level=0).cumcount(), append=True) # this is your step 1
.unstack(level=1)
)
# rename columns
new_df.columns = [f'{x}_{y}' for x,y in new_df.columns]
# reset_index
new_df = new_df.reset_index()
样本数据:
df = (pd.DataFrame({'a':['x']*4+['y']*3,
'b':[8,8,8,9,7,7,7],
'A':[1,2,3,4,5,6,7],
'B':[7,6,5,4,3,2,1]})
.set_index(['a','b'])
)
输出:
a A_0 A_1 A_2 A_3 B_0 B_1 B_2 B_3
0 x 1.0 2.0 3.0 4.0 7.0 6.0 5.0 4.0
1 y 5.0 6.0 7.0 NaN 3.0 2.0 1.0 NaN