根据唯一值过滤列值,但不对同一唯一值上同一列的不同值重复

时间:2021-01-22 22:54:46

标签: python-3.x pandas dataframe lambda apply

寻找方法来过滤具有 inactive 状态的唯一值,但不会在同一唯一值下重复为 active 状态。

df:

Unique_value    Status
1               Active        <- Has both active and inactive, must be inactive only
1               Active        <- Has both active and inactive, must be inactive only
1               Inactive      <- Has both active and inactive, must be inactive only
1               Inactive      <- Has both active and inactive, must be inactive only
2               Inactive      <- Has inactive only
2               Inactive      <- Has inactive only
2               Inactive      <- Has inactive only
3               Inactive      <- Has inactive only (cancelled okay to be filtered out)
3               Cancelled     <- Has inactive only (cancelled okay to be filtered out)
3               Inactive      <- Has inactive only (cancelled okay to be filtered out)

所需的输出:

Unique_value    status
2               Inactive
3               Inactive

到目前为止我尝试过的,但我认为这是不正确的。

p = ['Inactive', 'Active']
df.groupby('Unique_value')['Status'].apply(lambda x: (x =='Inactive') != set(p))

2 个答案:

答案 0 :(得分:1)

让我们试试

g=df[df.groupby('Unique_value')['Status'].transform(lambda x: ~(x.eq('Active').any()))]

g[g['Status'].eq('Inactive')].drop_duplicates()

答案 1 :(得分:1)

首先检查每组中的 any 值是否为 ActiveInactive。然后去掉两个条件都为真的组:

m1 = df["Status"].eq("Active").groupby(df["Unique_value"]).transform("any")
m2 = df["Status"].eq("Inactive").groupby(df["Unique_value"]).transform("any")
df[~(m1 & m2)].groupby("Unique_value", as_index=False).first()

   Unique_value    Status
0             2  Inactive
1             3  Inactive