堆栈数据并删除0s / NaN

时间:2016-04-25 13:01:40

标签: python pandas dataframe

考虑下表order_size

Symbol      BAX  BTP  CT  D  DX  ESTX50  GBM  GBP  GBS  GE  I  LE  NZD  S  ZL  
Date                                                                            
2016-03-03    0    0  -2  0   0       0    0    0    0   0  0   0    0  0   0   
2016-03-04  -12    0   0  0   0       0    0    0    0   1  0   0   -1  0   0   
2016-03-07    0    0   0  0  -1       0    1   -1    4  -1  1   0    1  1   0   
2016-03-08    0    0   0  0   0       0    0    0    0   0  0   0   -1  0   0   
2016-03-10    0    0   0  0   0       0    0    1   -1   0  0   0    0  0   0   
2016-03-11    0    0   0  0   0       0   -1   -1   -1   0 -1   0    1 -1   0   
2016-03-14    0    0   0  0   0       0    0    0    0   0  0   0   -1  1   0   
2016-03-15   -1    0   0  0   0       0    0    0    0   1  0   0    1  0   0   
2016-03-17    0    0   0  0   0       0    0    0    0  -1  0   0    0  0  -1 

我需要将其转换为堆叠视图,最后使用如下布局: Date | Symbol | Value,其中值不是0,表示所有条目都被删除。 如果我使用df.stack(),它会将其转换为pd.TimeSeries,这不是我想要的(因为我缺少第三列)。

Date        Symbol
2016-03-03  BAX        0
            BTP        0
            CT        -2
            D          0
            DX         0
            ESTX50     0
            GBM        0
            GBP        0

这使得似乎无法运行order_size.loc[:, (order_size.Value != 0).any(axis=0)]来删除0(因为Values不是pd.Series中的列)。

修改

在运行order_size.replace('0', np.NaN)之前

df.stack()几乎可以解决问题,但pd.Series仍然不可取,因为我需要第三列Value

1 个答案:

答案 0 :(得分:1)

我认为您可以先将0的所有值替换为NaN,然后将stackreset_index一起使用:

print df != 0
              BAX    BTP     CT      D     DX ESTX50    GBM    GBP    GBS  \
Date                                                                        
2016-03-03  False  False   True  False  False  False  False  False  False   
2016-03-04   True  False  False  False  False  False  False  False  False   
2016-03-07  False  False  False  False   True  False   True   True   True   
2016-03-08  False  False  False  False  False  False  False  False  False   
2016-03-10  False  False  False  False  False  False  False   True   True   
2016-03-11  False  False  False  False  False  False   True   True   True   
2016-03-14  False  False  False  False  False  False  False  False  False   
2016-03-15   True  False  False  False  False  False  False  False  False   
2016-03-17  False  False  False  False  False  False  False  False  False   

               GE      I     LE    NZD      S     ZL  
Date                                                  
2016-03-03  False  False  False  False  False  False  
2016-03-04   True  False  False   True  False  False  
2016-03-07   True   True  False   True   True  False  
2016-03-08  False  False  False   True  False  False  
2016-03-10  False  False  False  False  False  False  
2016-03-11  False   True  False   True   True  False  
2016-03-14  False  False  False   True   True  False  
2016-03-15   True  False  False   True  False  False  
2016-03-17   True  False  False  False  False   True
    
print df[df != 0]
             BAX  BTP   CT   D   DX  ESTX50  GBM  GBP  GBS   GE    I  LE  NZD  \
Date                                                                            
2016-03-03   NaN  NaN -2.0 NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN NaN  NaN   
2016-03-04 -12.0  NaN  NaN NaN  NaN     NaN  NaN  NaN  NaN  1.0  NaN NaN -1.0   
2016-03-07   NaN  NaN  NaN NaN -1.0     NaN  1.0 -1.0  4.0 -1.0  1.0 NaN  1.0   
2016-03-08   NaN  NaN  NaN NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN NaN -1.0   
2016-03-10   NaN  NaN  NaN NaN  NaN     NaN  NaN  1.0 -1.0  NaN  NaN NaN  NaN   
2016-03-11   NaN  NaN  NaN NaN  NaN     NaN -1.0 -1.0 -1.0  NaN -1.0 NaN  1.0   
2016-03-14   NaN  NaN  NaN NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN NaN -1.0   
2016-03-15  -1.0  NaN  NaN NaN  NaN     NaN  NaN  NaN  NaN  1.0  NaN NaN  1.0   
2016-03-17   NaN  NaN  NaN NaN  NaN     NaN  NaN  NaN  NaN -1.0  NaN NaN  NaN   

              S   ZL  
Date                  
2016-03-03  NaN  NaN  
2016-03-04  NaN  NaN  
2016-03-07  1.0  NaN  
2016-03-08  NaN  NaN  
2016-03-10  NaN  NaN  
2016-03-11 -1.0  NaN  
2016-03-14  1.0  NaN  
2016-03-15  NaN  NaN  
2016-03-17  NaN -1.0 
     
df1 = df[df != 0].stack().reset_index()
#set custom column names
df1.columns = ['Date','Symbol','Value']
print df1
          Date Symbol  Value
0   2016-03-03     CT   -2.0
1   2016-03-04    BAX  -12.0
2   2016-03-04     GE    1.0
3   2016-03-04    NZD   -1.0
4   2016-03-07     DX   -1.0
5   2016-03-07    GBM    1.0
6   2016-03-07    GBP   -1.0
7   2016-03-07    GBS    4.0
8   2016-03-07     GE   -1.0
9   2016-03-07      I    1.0
10  2016-03-07    NZD    1.0
11  2016-03-07      S    1.0
12  2016-03-08    NZD   -1.0
13  2016-03-10    GBP    1.0
14  2016-03-10    GBS   -1.0
15  2016-03-11    GBM   -1.0
16  2016-03-11    GBP   -1.0
17  2016-03-11    GBS   -1.0
18  2016-03-11      I   -1.0
19  2016-03-11    NZD    1.0
20  2016-03-11      S   -1.0
21  2016-03-14    NZD   -1.0
22  2016-03-14      S    1.0
23  2016-03-15    BAX   -1.0
24  2016-03-15     GE    1.0
25  2016-03-15    NZD    1.0
26  2016-03-17     GE   -1.0
27  2016-03-17     ZL   -1.0

replacereset_index的另一种解决方案:

df = df.replace({0:np.nan})
df1 = df[df != 0].stack().reset_index()
#set custom column names
df1.columns = ['Date','Symbol','Value']
print df1
          Date Symbol  Value
0   2016-03-03     CT   -2.0
1   2016-03-04    BAX  -12.0
2   2016-03-04     GE    1.0
3   2016-03-04    NZD   -1.0
4   2016-03-07     DX   -1.0
5   2016-03-07    GBM    1.0
6   2016-03-07    GBP   -1.0
7   2016-03-07    GBS    4.0
8   2016-03-07     GE   -1.0
9   2016-03-07      I    1.0
10  2016-03-07    NZD    1.0
11  2016-03-07      S    1.0
12  2016-03-08    NZD   -1.0
13  2016-03-10    GBP    1.0
14  2016-03-10    GBS   -1.0
15  2016-03-11    GBM   -1.0
16  2016-03-11    GBP   -1.0
17  2016-03-11    GBS   -1.0
18  2016-03-11      I   -1.0
19  2016-03-11    NZD    1.0
20  2016-03-11      S   -1.0
21  2016-03-14    NZD   -1.0
22  2016-03-14      S    1.0
23  2016-03-15    BAX   -1.0
24  2016-03-15     GE    1.0
25  2016-03-15    NZD    1.0
26  2016-03-17     GE   -1.0
27  2016-03-17     ZL   -1.0