Question

寻找方法来过滤具有 inactive 状态的唯一值，但不会在同一唯一值下重复为 active 状态。

df：

Unique_value    Status
1               Active        <- Has both active and inactive, must be inactive only
1               Active        <- Has both active and inactive, must be inactive only
1               Inactive      <- Has both active and inactive, must be inactive only
1               Inactive      <- Has both active and inactive, must be inactive only
2               Inactive      <- Has inactive only
2               Inactive      <- Has inactive only
2               Inactive      <- Has inactive only
3               Inactive      <- Has inactive only (cancelled okay to be filtered out)
3               Cancelled     <- Has inactive only (cancelled okay to be filtered out)
3               Inactive      <- Has inactive only (cancelled okay to be filtered out)

所需的输出：

Unique_value    status
2               Inactive
3               Inactive

到目前为止我尝试过的，但我认为这是不正确的。

p = ['Inactive', 'Active']
df.groupby('Unique_value')['Status'].apply(lambda x: (x =='Inactive') != set(p))

Answer 1

让我们试试

g=df[df.groupby('Unique_value')['Status'].transform(lambda x: ~(x.eq('Active').any()))]

g[g['Status'].eq('Inactive')].drop_duplicates()

Answer 2

首先检查每组中的 any 值是否为 Active 或 Inactive。然后去掉两个条件都为真的组：

m1 = df["Status"].eq("Active").groupby(df["Unique_value"]).transform("any")
m2 = df["Status"].eq("Inactive").groupby(df["Unique_value"]).transform("any")
df[~(m1 & m2)].groupby("Unique_value", as_index=False).first()


   Unique_value    Status
0             2  Inactive
1             3  Inactive

根据唯一值过滤列值，但不对同一唯一值上同一列的不同值重复

2 个答案: