我有一个包含以下列的数据框:
Index(u['Stock','EarnDate','Last','Settle','Change'],dtype='object')
EarnDate是反映下一个收益发布日期的日期。
我创建了一个数据透视表:
pivot = pd.pivot_table(df, index='EarnDate',columns='Stock'),dtype='object')
这给了我以下输出
Last Settle Chg
Stock Stock1 Stock2 Stock1 Stock2 Stock1 Stock2
EarnDate
2019-10-01 NaN 5.55 NaN 5.55 NaN +1
2019-11-01 65.91 3.43 62.91 6.55 -.5 +2
2019-12-01 62.97 6.87 61.97 7.00 +.4 +3
2020-01-01 63.33 6.66 61.38 9.50 -.3 +4
2020-02-01 60.91 5.98 60.99 8.50 +.2 +5
2020-03-01 60.71 6.23 60.70 7.50 -.15 +6
我想做的是按Stock将Last,Settle,Chg,Chant字段分组,这样看起来像这样:
Stock Stock1 Stock 2
Last Settle Chg Last Settle Chg
EarnDate
2019-10-01 NaN NaN NaN 5.55 5.55 +1
2019-11-01 65.91 62.91 -.5 3.43 6.55 +2
2019-12-01 62.97 61.97 +.4 6.87 7.00 +3
2020-01-01 63.33 61.38 -.3 6.66 9.50 +4
2020-02-01 60.91 60.99 +.2 5.98 8.50 +5
2020-03-01 60.71 60.70 -.15 6.23 7.50 +6
我尝试了各种stack()/ unstack()组,但均未成功。有人可以带我回家吗?谢谢!
答案 0 :(得分:1)
将DataFrame.swaplevel
与DataFrame.reindex
一起使用:
mux = pd.MultiIndex.from_product([['Stock1', 'Stock2'], ['Last', 'Settle', 'Chg']])
df = df.swaplevel(0,1, axis=1).reindex(mux, axis=1)
print (df)
Stock1 Stock2
Last Settle Chg Last Settle Chg
2019-10-01 NaN NaN NaN 5.55 5.55 1
2019-11-01 65.91 62.91 -0.50 3.43 6.55 2
2019-12-01 62.97 61.97 0.40 6.87 7.00 3
2020-01-01 63.33 61.38 -0.30 6.66 9.50 4
2020-02-01 60.91 60.99 0.20 5.98 8.50 5
2020-03-01 60.71 60.70 -0.15 6.23 7.50 6
原因是如果使用DataFrame.sort_index
在第二级中获得不同的列顺序:
df = df.swaplevel(0,1, axis=1).sort_index(axis=1, level=0)
print (df)
Stock1 Stock2
Chg Last Settle Chg Last Settle
2019-10-01 NaN NaN NaN 1 5.55 5.55
2019-11-01 -0.50 65.91 62.91 2 3.43 6.55
2019-12-01 0.40 62.97 61.97 3 6.87 7.00
2020-01-01 -0.30 63.33 61.38 4 6.66 9.50
2020-02-01 0.20 60.91 60.99 5 5.98 8.50
2020-03-01 -0.15 60.71 60.70 6 6.23 7.50