将Pandas MultiIndex列转换为行

时间:2019-01-24 20:52:22

标签: python-3.x pandas pandas-groupby

我正在使用时间序列Pandas数据框,其中每个公司都有一些按日期索引的观察值。对于每个观察,都有多个值。我一直在尝试使用熊猫MultiIndex与堆栈/非堆栈进行比较,但已经接近预期的结果,但是卡住了:

这就是我尝试过的-完整的代码可以重现:

import pandas.util.testing as tm; tm.N = 3
import numpy as np

def unpivot(frame):
    N, K = frame.shape
    data = {'value' : frame.values.ravel('F'),
            'variable' : np.asarray(frame.columns).repeat(N),
            'date' : np.tile(np.asarray(frame.index), K)}
    return pd.DataFrame(data, columns=['date', 'variable', 'value'])

df = unpivot(tm.makeTimeDataFrame())
df['topic'] = df.groupby('variable').cumcount()+1
df['company'] = np.random.choice(['Apple Inc', 'Google', 'Nike Corp'], len(df))

这就是我的开始:

         date variable     value  topic    company
0  2000-01-03        A -1.250926      1  Apple Inc
1  2000-01-04        A -0.807887      2     Google
2  2000-01-05        A  0.302593      3  Nike Corp
3  2000-01-03        B  0.502474      1  Nike Corp
4  2000-01-04        B  0.974968      2  Apple Inc
5  2000-01-05        B  0.698478      3     Google
6  2000-01-03        C  0.871437      1  Nike Corp
7  2000-01-04        C  0.235059      2  Apple Inc
8  2000-01-05        C -1.827321      3     Google
9  2000-01-03        D -0.764457      1     Google
10 2000-01-04        D  0.014109      2  Nike Corp
11 2000-01-05        D -0.966902      3     Google

我的目标是这样的:

topic                                 1         2         3
company   variable date                                    
Apple Inc A        2000-01-03      -1.2       NaN       NaN
          B        2000-01-04       NaN       0.9       NaN
          C        2000-01-04       NaN       0.2       NaN
Google    A        2000-01-04       NaN      -0.8       NaN
          B        2000-01-05       NaN       NaN       0.69


我已经尝试过了,但是'date'不正确:

df.set_index(['company', 'variable', 'date'], inplace=False).unstack()
                        value                            topic             \
date               2000-01-03 2000-01-04 2000-01-05 2000-01-03 2000-01-04   
company   variable                                                          
Apple Inc A         -1.250926        NaN        NaN        1.0        NaN   
          B               NaN   0.974968        NaN        NaN        2.0   
          C               NaN   0.235059        NaN        NaN        2.0   
Google    A               NaN  -0.807887        NaN        NaN        2.0   
          B               NaN        NaN   0.698478        NaN        NaN   
          C               NaN        NaN  -1.827321        NaN        NaN   
          D         -0.764457        NaN  -0.966902        1.0        NaN   
Nike Corp A               NaN        NaN   0.302593        NaN        NaN   
          B          0.502474        NaN        NaN        1.0        NaN   
          C          0.871437        NaN        NaN        1.0        NaN   
          D               NaN   0.014109        NaN        NaN        2.0   


date               2000-01-05  
company   variable             
Apple Inc A               NaN  
          B               NaN  
          C               NaN  
Google    A               NaN  
          B               3.0  
          C               3.0  
          D               3.0  
Nike Corp A               3.0  
          B               NaN  
          C               NaN  
          D               NaN 

1 个答案:

答案 0 :(得分:2)

使用get_dummies

进行检查
s=df.set_index(['company','variable','date']).topic.astype(str).str.get_dummies()
s.mul(s.columns).sort_index(level=0).sum(level=[0,1,2])
Out[335]: 
                              1  2  3
company  variable date               
AppleInc A        2000-01-03  1      
         B        2000-01-05        3
         D        2000-01-03  1      
                  2000-01-05        3
Google   A        2000-01-04     2   
                  2000-01-05        3
         B        2000-01-03  1      
         C        2000-01-03  1      
                  2000-01-04     2   
         D        2000-01-04     2   
NikeCorp B        2000-01-04     2   
         C        2000-01-05        3

自问题更新以来更新

df.pivot_table(index=['company','variable','date'],columns='topic',values='value')