我目前有一个这种结构的pandas多索引(以自动收报机和字段为索引):
value
ticker field
DE0001141174 Govt CASH_FLOW_DATE 2000-11-21
CASH_FLOW_AMOUNT 51250
PRINCIPAL_AMOUNT 1e+06
DE0001141232 Govt CASH_FLOW_DATE 2000-05-17
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2001-05-17
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2002-05-17
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 1e+06
DE0001141380 Govt CASH_FLOW_DATE 2002-08-18
CASH_FLOW_AMOUNT 67808.2
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2003-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2004-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2005-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 0
CASH_FLOW_DATE 2006-08-18
CASH_FLOW_AMOUNT 45000
PRINCIPAL_AMOUNT 1e+06
我希望将它转换为这样的结构,其中的代码和CASH_FLOW_DATE是索引:
ticker CASH_FLOW_DATE CASH_FLOW_AMOUNT PRINCIPAL_AMOUNT
DE0001141174 Govt 2000-11-21 51250 1e+06
DE0001141232 Govt 2000-05-17 45000 0
2001-05-17 45000 0
2002-05-17 45000 1e+06
DE0001141380 Govt 2002-08-18 67808.2 0
2003-08-18 45000 0
2004-08-18 45000 0
2005-08-18 45000 0
2006-08-18 45000 0
我想问题是python / pandas无法自然地认识到'CASH_FLOW_DATE'下面的两行与该值有关。 我想我可以用很多丑陋的循环做到这一点,但我想知道是否有更多的pythonic方式来做这件事。
答案 0 :(得分:1)
对于新的索引级别,您需要cumcount
,set_index
附加到原始索引,然后调用unstack
:
df = df.set_index(df.groupby(level=[0,1]).cumcount(), append=True)
df = df['value'].unstack(level=1, fill_value=0).reset_index(level=1, drop=True).reset_index()
print (df)
field ticker CASH_FLOW_AMOUNT CASH_FLOW_DATE PRINCIPAL_AMOUNT
0 DE0001141174 Govt 51250 2000-11-21 1e+06
1 DE0001141232 Govt 45000 2000-05-17 0
2 DE0001141232 Govt 45000 2001-05-17 0
3 DE0001141232 Govt 45000 2002-05-17 1e+06
4 DE0001141380 Govt 67808.2 2002-08-18 0
5 DE0001141380 Govt 45000 2003-08-18 0
6 DE0001141380 Govt 45000 2004-08-18 0
7 DE0001141380 Govt 45000 2005-08-18 0
8 DE0001141380 Govt 45000 2006-08-18 1e+06