使用IndexSlice过滤带有Pandas的MultiIndex数据帧

时间:2016-10-26 14:14:49

标签: python pandas slice nan multi-index

问题:如何过滤行,以便只返回注入不等于0或NaN的行而不丢失其他列的值?

我使用以下代码创建了一个数据框:

import pandas as pd

df=pd.DataFrame(
               [
               [5777, 100, 5385, 200, 5419, 4887, 100, 200],
               [4849, 0, 4539, 0, 3381, 0, 0, ],
               [4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
               [4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
               [5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
               ],
               index=pd.to_datetime(['2010-01-01',
                                     '2010-01-02',
                                     '2010-01-03',
                                     '2010-01-04',
                                     '2010-01-05']),
               columns=pd.MultiIndex.from_tuples(
                                                [('Portfolio A', 'GBP', 'amount'),
                                                 ('Portfolio A', 'GBP', 'injection'),
                                                 ('Portfolio B', 'EUR', 'amount'),                                           ('Portfolio B', 'EUR', 'injection'),
                                                 ('Portfolio C', 'USD', 'amount'),                                           ('Portfolio C', 'USD', 'injection'),
                                                 ('Portfolio D', 'JPY', 'amount'),                                           ('Portfolio D', 'JPY', 'injection')])
                                   ).sortlevel(axis=1)

接下来,我可以使用数据切片创建一个DataFrame(在这个例子中,它是所有数据)

df1=df.loc[pd.IndexSlice[:], pd.IndexSlice[:,:, ['amount', 'injection']]]

接下来创建一个新的DataFrame,其中注入为!= 0

df2=df1[df1.loc[pd.IndexSlice[:], pd.IndexSlice[:, :, 'injection']]!=0]

问题:为什么这会重置' amount'中的所有值列到NaN?

金额可用后的下一步是删除所有NaN

的行
df3=df2.dropna(axis=0, how='all', thresh=None, subset=None, inplace=False)

所需的输出是行索引的所有数据:

2010-01-01
2010-01-03
2010-01-04
2010-01-05

1 个答案:

答案 0 :(得分:1)

我认为您需要使用fillna添加any以检查至少一个True值,如果需要boolean indexing,其掩码为boolean Series

print (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0)
           Portfolio A Portfolio B Portfolio C Portfolio D
                   GBP         EUR         USD         JPY
             injection   injection   injection   injection
2010-01-01        True        True        True        True
2010-01-02       False       False       False       False
2010-01-03       False       False        True       False
2010-01-04        True        True        True        True
2010-01-05       False       False        True       False

mask = (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0).any(axis=1)
print (mask)
2010-01-01     True
2010-01-02    False
2010-01-03     True
2010-01-04     True
2010-01-05     True
dtype: bool

print (df1[mask])
           Portfolio A           Portfolio B           Portfolio C            \
                   GBP                   EUR                   USD             
                amount injection      amount injection      amount injection   
2010-01-01        5777       100        5385       200        5419      4887   
2010-01-03        4971         0        3824         0        4645      3424   
2010-01-04        4827       200        3459       300        4552      3153   
2010-01-05        5207         0        3670         0        4876      3358   

           Portfolio D            
                   JPY            
                amount injection  
2010-01-01         100     200.0  
2010-01-03           0       0.0  
2010-01-04         100     200.0  
2010-01-05           0       0.0  

如果使用boolean DataFrame掩码获取NaN False值。