我有df:
pd.DataFrame({'period': {0: pd.Timestamp('2016-05-01 00:00:00'),
1: pd.Timestamp('2017-05-01 00:00:00'),
2: pd.Timestamp('2018-03-01 00:00:00'),
3: pd.Timestamp('2018-04-01 00:00:00'),
4: pd.Timestamp('2016-05-01 00:00:00'),
5: pd.Timestamp('2017-05-01 00:00:00'),
6: pd.Timestamp('2016-03-01 00:00:00'),
7: pd.Timestamp('2016-04-01 00:00:00')},
'cost2': {0: 15,
1: 144,
2: 44,
3: 34,
4: 13,
5: 11,
6: 12,
7: 13},
'rev2': {0: 154,
1: 13,
2: 33,
3: 37,
4: 15,
5: 11,
6: 12,
7: 13},
'cost1': {0: 19,
1: 39,
2: 53,
3: 16,
4: 19,
5: 11,
6: 12,
7: 13},
'rev1': {0: 34,
1: 34,
2: 74,
3: 22,
4: 34,
5: 11,
6: 12,
7: 13},
'destination': {0: 'YYZ',
1: 'YYZ',
2: 'YYZ',
3: 'YYZ',
4: 'DFW',
5: 'DFW',
6: 'DFW',
7: 'DFW'},
'source': {0: 'SFO',
1: 'SFO',
2: 'SFO',
3: 'SFO',
4: 'MIA',
5: 'MIA',
6: 'MIA',
7: 'MIA'}})
df = df[['source','destination','period','rev1','rev2','cost1','cost2']]
看起来像:
我希望最终的df包含以下列:
2017-05-01 2016-05-01
source, destination, rev1, rev2, cost1, cost2, rev1, rev2, cost1, cost2...
基本上,对于每个源/目标对,我想要在一行中为每个日期创建收入和成本数字。
我一直在修补堆栈和拆散,但是没有能够实现我的目标。
答案 0 :(得分:2)
您可以使用set_index
+ unstack
将长号更改为广角,然后使用swaplevel
更改所需的列索引格式
df.set_index(['destination','source','period']).unstack().swaplevel(0,1,axis=1).sort_index(level=0,axis=1)
答案 1 :(得分:1)
.set_index
+ .unstack
的替代方法是.pivot_table
:
df.pivot_table( \
index=['source', 'destination'], \
columns=['period'], \
values=['rev1', 'rev2', 'cost1', 'cost2'] \
).swaplevel(axis=1).sort_index(axis=1, level=0)
# period 2016-03-01 2016-04-01 ...
# cost1 cost2 rev1 rev2 cost1 cost2 rev1 rev2
# source destination
# MIA DFW 12.0 12.0 12.0 12.0 13.0 13.0 13.0 13.0
# SFO YYZ NaN NaN NaN NaN NaN NaN NaN NaN