我想知道是否有一个命令可以删除超过70%零或X%零的列。像:
@user_registered.connect_via(app)
def after_user_registered(sender, user, **kwargs):
msg = Message(subject="A new user registered",
body='Some message',
recipients=["admin@example.com"])
mail.send(msg)
表示NaN。
谢谢!
答案 0 :(得分:4)
只需将df.isnull().mean()
更改为(df==0).mean()
:
df = df.loc[:, (df==0).mean() < .7]
这是一个演示:
df
Out:
0 1 2 3 4
0 1 1 1 1 0
1 1 0 0 0 1
2 0 1 1 0 0
3 1 0 0 1 0
4 1 1 1 1 1
5 1 0 0 0 0
6 0 1 0 0 0
7 0 1 1 0 0
8 1 0 0 1 0
9 0 0 0 1 0
(df==0).mean()
Out:
0 0.4
1 0.5
2 0.6
3 0.5
4 0.8
dtype: float64
df.loc[:, (df==0).mean() < .7]
Out:
0 1 2 3
0 1 1 1 1
1 1 0 0 0
2 0 1 1 0
3 1 0 0 1
4 1 1 1 1
5 1 0 0 0
6 0 1 0 0
7 0 1 1 0
8 1 0 0 1
9 0 0 0 1