问题:如何过滤行,以便只返回注入不等于0或NaN的行而不丢失其他列的值?
我使用以下代码创建了一个数据框:
import pandas as pd
df=pd.DataFrame(
[
[5777, 100, 5385, 200, 5419, 4887, 100, 200],
[4849, 0, 4539, 0, 3381, 0, 0, ],
[4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
[4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
[5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
],
index=pd.to_datetime(['2010-01-01',
'2010-01-02',
'2010-01-03',
'2010-01-04',
'2010-01-05']),
columns=pd.MultiIndex.from_tuples(
[('Portfolio A', 'GBP', 'amount'),
('Portfolio A', 'GBP', 'injection'),
('Portfolio B', 'EUR', 'amount'), ('Portfolio B', 'EUR', 'injection'),
('Portfolio C', 'USD', 'amount'), ('Portfolio C', 'USD', 'injection'),
('Portfolio D', 'JPY', 'amount'), ('Portfolio D', 'JPY', 'injection')])
).sortlevel(axis=1)
接下来,我可以使用数据切片创建一个DataFrame(在这个例子中,它是所有数据)
df1=df.loc[pd.IndexSlice[:], pd.IndexSlice[:,:, ['amount', 'injection']]]
接下来创建一个新的DataFrame,其中注入为!= 0
df2=df1[df1.loc[pd.IndexSlice[:], pd.IndexSlice[:, :, 'injection']]!=0]
问题:为什么这会重置' amount'中的所有值列到NaN?
金额可用后的下一步是删除所有NaN
的行df3=df2.dropna(axis=0, how='all', thresh=None, subset=None, inplace=False)
所需的输出是行索引的所有数据:
2010-01-01
2010-01-03
2010-01-04
2010-01-05
答案 0 :(得分:1)
我认为您需要使用fillna
添加any
以检查至少一个True
值,如果需要boolean indexing
,其掩码为boolean Series
:
print (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0)
Portfolio A Portfolio B Portfolio C Portfolio D
GBP EUR USD JPY
injection injection injection injection
2010-01-01 True True True True
2010-01-02 False False False False
2010-01-03 False False True False
2010-01-04 True True True True
2010-01-05 False False True False
mask = (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0).any(axis=1)
print (mask)
2010-01-01 True
2010-01-02 False
2010-01-03 True
2010-01-04 True
2010-01-05 True
dtype: bool
print (df1[mask])
Portfolio A Portfolio B Portfolio C \
GBP EUR USD
amount injection amount injection amount injection
2010-01-01 5777 100 5385 200 5419 4887
2010-01-03 4971 0 3824 0 4645 3424
2010-01-04 4827 200 3459 300 4552 3153
2010-01-05 5207 0 3670 0 4876 3358
Portfolio D
JPY
amount injection
2010-01-01 100 200.0
2010-01-03 0 0.0
2010-01-04 100 200.0
2010-01-05 0 0.0
如果使用boolean DataFrame
掩码获取NaN
False
值。