删除超过70%零的列

时间:2017-05-29 21:44:10

标签: python pandas numpy dataframe

我想知道是否有一个命令可以删除超过70%零或X%零的列。像:

@user_registered.connect_via(app)
def after_user_registered(sender, user, **kwargs):
    msg = Message(subject="A new user registered",
                  body='Some message',
                  recipients=["admin@example.com"])
    mail.send(msg)

表示NaN。

谢谢!

1 个答案:

答案 0 :(得分:4)

只需将df.isnull().mean()更改为(df==0).mean()

df = df.loc[:, (df==0).mean() < .7]

这是一个演示:

df
Out: 
   0  1  2  3  4
0  1  1  1  1  0
1  1  0  0  0  1
2  0  1  1  0  0
3  1  0  0  1  0
4  1  1  1  1  1
5  1  0  0  0  0
6  0  1  0  0  0
7  0  1  1  0  0
8  1  0  0  1  0
9  0  0  0  1  0

(df==0).mean()
Out: 
0    0.4
1    0.5
2    0.6
3    0.5
4    0.8
dtype: float64

df.loc[:, (df==0).mean() < .7]
Out: 
   0  1  2  3
0  1  1  1  1
1  1  0  0  0
2  0  1  1  0
3  1  0  0  1
4  1  1  1  1
5  1  0  0  0
6  0  1  0  0
7  0  1  1  0
8  1  0  0  1
9  0  0  0  1