我有一个以下列格式生成Dataframe的CSV
--------------------------------------------------------------
|Date | Fund | TradeGroup | LongShort | Alpha | Details|
--------------------------------------------------------------
|2018-05-22 |A | TGG-A | Long | 3.99 | Misc |
|2018-05-22 |A | TGG-B | Long | 4.99 | Misc |
|2018-05-22 |B | TGG-A | Long | 5.99 | Misc |
|2018-05-22 |B | TGG-B | Short | 6.99 | Misc |
|2018-05-22 |C | TGG-A | Long | 1.99 | Misc |
|2018-05-22 |C | TGG-B | Long | 5.29 | Misc |
--------------------------------------------------------------
我想做的是,将集团贸易集团合并在一起并将基金转换为专栏。因此,最终的数据框应如下所示:
--------------------------------------------------------
|TradeGroup| Date | A | B | C |
--------------------------------------------------------
| TGG-A |2018-05-22 | 3.99 | 5.99 | 1.99 |
| TGG-B |2018-05-22 | 4.99 | 6.99 | 5.29 |
--------------------------------------------------------
另外,我并不关心LongShort Column和Details Column。所以,如果它们被丢弃也没关系。谢谢!!
我试过了df.pivot()
,但它没有提供所需的格式
答案 0 :(得分:1)
看起来您正在尝试从多索引中取消堆栈。
试试这个:
import pandas as pd
data = '''\
Date Fund TradeGroup LongShort Alpha Details
2018-05-22 A TGG-A Long 3.99 Misc
2018-05-22 A TGG-B Long 4.99 Misc
2018-05-22 B TGG-A Long 5.99 Misc
2018-05-22 B TGG-B Short 6.99 Misc
2018-05-22 C TGG-A Long 1.99 Misc
2018-05-22 C TGG-B Long 5.29 Misc'''
fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, sep='\s+')
dfout = df.set_index(['TradeGroup','Date','Fund']).unstack()['Alpha']
print(dfout)
返回:
Fund A B C
TradeGroup Date
TGG-A 2018-05-22 3.99 5.99 1.99
TGG-B 2018-05-22 4.99 6.99 5.29
如果您愿意,您也可以申请.reset_index()
,然后获得:
Fund TradeGroup Date A B C
0 TGG-A 2018-05-22 3.99 5.99 1.99
1 TGG-B 2018-05-22 4.99 6.99 5.29
答案 1 :(得分:0)
res = df.pivot_table(index=['Date', 'TradeGroup'], columns='Fund',
values='Alpha', aggfunc='first').reset_index()
print(res)
Fund Date TradeGroup A B C
0 2018-05-22 TGG-A 3.99 5.99 1.99
1 2018-05-22 TGG-B 4.99 6.99 5.29