我有以下数据框:
import pandas as pd
df = pd.DataFrame({'id':['a','b','c','d','e'],
'A':[-14,-90,-90,-96,-91],
'B':[-103,0,-110,-114,-114],
'D':[0,0,0,0,0],
'C':[-101,0,-110,0,0]})
看起来像这样:
A B C D id
0 -14 -103 -101 0 a
1 -90 0 0 0 b
2 -90 -110 -110 0 c
3 -96 -114 0 0 d
4 -91 -114 0 0 e
我想要做的是如果超过2行中有0,则执行删除任何列的操作。我怎样才能做到这一点?
最后将包含此列的数据框:A,B,id。
答案 0 :(得分:3)
您可以将cumsum
与any
一起用于掩码,然后稍微更改boolean indexing
以便按列进行选择:
mask = ((df == 0).cumsum() > 1).any()
print (mask)
A False
B False
C True
id False
dtype: bool
print (df.ix[:, ~mask])
A B id
0 -14 -103 a
1 -90 0 b
2 -90 -110 c
3 -96 -114 d
4 -91 -114 e
面具说明:
print (df == 0)
A B C id
0 False False False False
1 False True True False
2 False False False False
3 False False True False
4 False False True False
print ((df == 0).cumsum())
A B C id
0 0 0 0 0
1 0 1 1 0
2 0 1 1 0
3 0 1 2 0
4 0 1 3 0
print ((df == 0).cumsum() > 1)
A B C id
0 False False False False
1 False False False False
2 False False False False
3 False False False False
4 False False True False
EDIt评论 - 掩护需要all
:
mask = (df == 0).all()
print (mask)
A False
B False
C False
D True
id False
dtype: bool
print (df.ix[:, ~mask])
A B C id
0 -14 -103 -101 a
1 -90 0 0 b
2 -90 -110 -110 c
3 -96 -114 0 d
4 -91 -114 0 e