熊猫:在列上的多索引丢失后,从宽到长

时间:2017-02-08 13:42:26

标签: python pandas

考虑通常的CREATE TRIGGER trigger_delete_cascading BEFORE DELETE ON project DELETE works_on WHERE project_id = id; 数据框:

trades

在这里,我想重塑从长到宽,做一些事情,然后从宽到长重塑

从长到宽很容易

trades = pd.DataFrame({
         'time': pd.to_datetime(['20160525 13:30:00.023',
                      '20160525 13:30:00.038',
                      '20160525 13:30:00.048',
                      '20160525 13:30:00.048',
                      '20160525 13:30:00.048']),
         'ticker': ['MSFT', 'MSFT','GOOG', 'BOOB', 'AAPL'],
         'price': [51.95, 51.95, 720.77, 720.92, 98.00],
         'quantity': [75, 155, 100, 100, 100]},
        columns=['time', 'ticker', 'price', 'quantity'])

trades
Out[42]: 
                     time ticker   price  quantity
0 2016-05-25 13:30:00.023   MSFT   51.95        75
1 2016-05-25 13:30:00.038   MSFT   51.95       155
2 2016-05-25 13:30:00.048   GOOG  720.77       100
3 2016-05-25 13:30:00.048   BOOB  720.92       100
4 2016-05-25 13:30:00.048   AAPL   98.00       100

但是现在由于很多原因,我现在想在列上使用这个多索引,所以我有以下代码:

trades.set_index(['time','ticker'], inplace = True)
trades = trades.unstack()

trades
Out[44]: 
                        price                        quantity                \
ticker                   AAPL    BOOB    GOOG   MSFT     AAPL   BOOB   GOOG   
time                                                                          
2016-05-25 13:30:00.023   NaN     NaN     NaN  51.95      NaN    NaN    NaN   
2016-05-25 13:30:00.038   NaN     NaN     NaN  51.95      NaN    NaN    NaN   
2016-05-25 13:30:00.048  98.0  720.92  720.77    NaN    100.0  100.0  100.0   


ticker                    MSFT  
time                            
2016-05-25 13:30:00.023   75.0  
2016-05-25 13:30:00.038  155.0  
2016-05-25 13:30:00.048    NaN  

这基本上摆脱了多索引并允许我使用普通列。数据现在看起来像:

trades.columns=['_'.join(t) for t in trades.columns]

问题是:我现在怎样才能回到长格式?

1 个答案:

答案 0 :(得分:1)

您可以使用:

#create MultiIndex from columns 
trades.columns = trades.columns.str.split('_', expand=True)
#stack and set index names for new column names
trades = trades.stack().rename_axis(['time','ticker']).reset_index()
#convert to int
trades.quantity = trades.quantity.astype(int)
print (trades)
                     time ticker   price  quantity
0 2016-05-25 13:30:00.023   MSFT   51.95        75
1 2016-05-25 13:30:00.038   MSFT   51.95       155
2 2016-05-25 13:30:00.048   AAPL   98.00       100
3 2016-05-25 13:30:00.048   BOOB  720.92       100
4 2016-05-25 13:30:00.048   GOOG  720.77       100