寻找方法来过滤具有 inactive
状态的唯一值,但不会在同一唯一值下重复为 active
状态。
df:
Unique_value Status
1 Active <- Has both active and inactive, must be inactive only
1 Active <- Has both active and inactive, must be inactive only
1 Inactive <- Has both active and inactive, must be inactive only
1 Inactive <- Has both active and inactive, must be inactive only
2 Inactive <- Has inactive only
2 Inactive <- Has inactive only
2 Inactive <- Has inactive only
3 Inactive <- Has inactive only (cancelled okay to be filtered out)
3 Cancelled <- Has inactive only (cancelled okay to be filtered out)
3 Inactive <- Has inactive only (cancelled okay to be filtered out)
所需的输出:
Unique_value status
2 Inactive
3 Inactive
到目前为止我尝试过的,但我认为这是不正确的。
p = ['Inactive', 'Active']
df.groupby('Unique_value')['Status'].apply(lambda x: (x =='Inactive') != set(p))
答案 0 :(得分:1)
让我们试试
g=df[df.groupby('Unique_value')['Status'].transform(lambda x: ~(x.eq('Active').any()))]
g[g['Status'].eq('Inactive')].drop_duplicates()
答案 1 :(得分:1)
首先检查每组中的 any
值是否为 Active
或 Inactive
。然后去掉两个条件都为真的组:
m1 = df["Status"].eq("Active").groupby(df["Unique_value"]).transform("any")
m2 = df["Status"].eq("Inactive").groupby(df["Unique_value"]).transform("any")
df[~(m1 & m2)].groupby("Unique_value", as_index=False).first()
Unique_value Status
0 2 Inactive
1 3 Inactive