我有一个如下所示的数据框
df:
ID Age_days N_30 N_31_90 N_91_180 N_181_365 Group
1 201 60 15 30 40 Good
2 20 2 15 5 20 Normal
3 10 4 0 0 0 Normal
4 100 0 0 0 80 Normal
5 600 0 6 5 60 Good
6 800 0 0 15 0 Good
7 500 10 10 30 40 Normal
8 200 0 0 0 100 Good
9 500 0 0 0 20 Normal
10 80 0 12 0 20 Normal
哪里
N_30 - Number of transactions in last 30 days
N_31_90 - Number of transactions in last 31 to 90 days and so on.
Conditions for filtering
If Age_days is less than 30, N_31_90, N_91_180, N_181_365 should be 0.
If Age_days is less than 90, N_91_180, N_181_365 should be 0.
If Age_days is less than 180, N_181_365 should be 0.
但是在上面的数据中,有一些行的 Age_days 较少并且之前交易过。 我想过滤这些行。
预期输出:
ID Age_days N_30 N_31_90 N_91_180 N_181_365 Group
2 20 2 15 5 20 Normal
4 100 0 0 0 80 Normal
10 80 0 12 0 20 Normal
答案 0 :(得分:1)
使用 Boolean Mask
过滤条件:
m1 = (df['Age_days'] <= 30) & ((df['N_31_90'] !=0) | (df['N_91_180'] !=0) | (df['N_181_365'] !=0))
m2 = (df['Age_days'] <= 90) & ((df['N_91_180'] !=0) | (df['N_181_365'] !=0))
m3 = (df['Age_days'] <= 180) & (df['N_181_365'] !=0)
print(df[m1|m2|m3])
m1 是无效条件的布尔掩码,其中 Age_days
为 <= 30
,而超过 30 天前执行的交易存在非零值。 m2 和 m3 类似。
然后我们对 m1|m2|m3
中的 df[m1|m2|m3]
进行布尔运算,以过滤具有 3 个无效条件之一的行。
输出:
ID Age_days N_30 N_31_90 N_91_180 N_181_365 Group
1 2 20 2 15 5 20 Normal
3 4 100 0 0 0 80 Normal
9 10 80 0 12 0 20 Normal
答案 1 :(得分:1)
您可以使用这一衬里:
import numpy as np
df2 = df.loc[df['Age_days'] < np.maximum(np.maximum((df['N_31_90'] > 0) * 31 , (df['N_91_180'] > 0) * 91), (df['N_181_365'] > 0) * 181)]