Question

我有df：

pd.DataFrame({'period': {0: pd.Timestamp('2016-05-01 00:00:00'),
  1: pd.Timestamp('2017-05-01 00:00:00'),
  2: pd.Timestamp('2018-03-01 00:00:00'),
  3: pd.Timestamp('2018-04-01 00:00:00'),
  4: pd.Timestamp('2016-05-01 00:00:00'),
  5: pd.Timestamp('2017-05-01 00:00:00'),
  6: pd.Timestamp('2016-03-01 00:00:00'),
  7: pd.Timestamp('2016-04-01 00:00:00')},
 'cost2': {0: 15,
  1: 144,
  2: 44,
  3: 34,
  4: 13,
  5: 11,
  6: 12,
  7: 13},
 'rev2': {0: 154,
  1: 13,
  2: 33,
  3: 37,
  4: 15,
  5: 11,
  6: 12,
  7: 13},
 'cost1': {0: 19,
  1: 39,
  2: 53,
  3: 16,
  4: 19,
  5: 11,
  6: 12,
  7: 13},
 'rev1': {0: 34,
  1: 34,
  2: 74,
  3: 22,
  4: 34,
  5: 11,
  6: 12,
  7: 13},
 'destination': {0: 'YYZ',
  1: 'YYZ',
  2: 'YYZ',
  3: 'YYZ',
  4: 'DFW',
  5: 'DFW',
  6: 'DFW',
  7: 'DFW'},
 'source': {0: 'SFO',
  1: 'SFO',
  2: 'SFO',
  3: 'SFO',
  4: 'MIA',
  5: 'MIA',
  6: 'MIA',
  7: 'MIA'}})

df = df[['source','destination','period','rev1','rev2','cost1','cost2']]

看起来像：

我希望最终的df包含以下列：

                     2017-05-01                2016-05-01
source, destination, rev1, rev2, cost1, cost2, rev1, rev2, cost1, cost2...

基本上，对于每个源/目标对，我想要在一行中为每个日期创建收入和成本数字。

我一直在修补堆栈和拆散，但是没有能够实现我的目标。

Answer 1

您可以使用set_index + unstack将长号更改为广角，然后使用swaplevel更改所需的列索引格式

df.set_index(['destination','source','period']).unstack().swaplevel(0,1,axis=1).sort_index(level=0,axis=1)

Answer 2

.set_index + .unstack的替代方法是.pivot_table：

df.pivot_table( \
     index=['source', 'destination'], \
     columns=['period'], \
     values=['rev1', 'rev2', 'cost1', 'cost2'] \
   ).swaplevel(axis=1).sort_index(axis=1, level=0)

# period             2016-03-01                   2016-04-01                    ...
#                         cost1 cost2  rev1  rev2      cost1 cost2  rev1  rev2   
# source destination                                                             
# MIA    DFW               12.0  12.0  12.0  12.0       13.0  13.0  13.0  13.0   
# SFO    YYZ                NaN   NaN   NaN   NaN        NaN   NaN   NaN   NaN

pandas stack / unstack使用swaplevel重塑df

2 个答案: