熊猫有条件申请

时间:2019-03-19 10:47:41

标签: python pandas apply loc

我有状态重复的客户重复项,因为每个客户订阅/产品都有一行。我想为客户生成一个new_status并使其“取消”,每个订阅状态都必须一起“取消”。

我用过:

df['duplicated'] = df.groupby('customer', as_index=False)['customer'].cumcount()

分隔索引中的每个重复项以指示重复的值

Customer | Status | new_status | duplicated
 X       |canceled|            | 0
 X       |canceled|            | 1
 X       |active  |            | 2
 Y       |canceled|            | 0
 A       |canceled|            | 0
 A       |canceled|            | 1
 B       |active  |            | 0
 B       |canceled|            | 1

因此,我想使用.apply和/或.loc生成:

Customer | Status | new_status | duplicated
 X       |canceled|            | 0
 X       |canceled|            | 1
 X       |active  |            | 2
 Y       |canceled|            | 0
 A       |canceled| canceled   | 0
 A       |canceled| canceled   | 1
 B       |active  |            | 0
 B       |canceled|            | 1

2 个答案:

答案 0 :(得分:2)

Series.eq来比较==的列,并使用GroupBy.transformGroupBy.all来检查每个组中是否所有值都是True,然后比较{{1 }} Series.duplicatedCustomer一起返回所有重复。最后按位keep=FalseAND)链接在一起,并按numpy.where设置值:

&

答案 1 :(得分:1)

据我了解,您可以尝试做:

df['new_status']=(df.groupby('Customer')['Status'].
  transform(lambda x: x.eq('canceled').all()).map({True:'cancelled'})).fillna(df.new_status)
print(df)

    Customer    Status new_status  duplicated
0   X         canceled             0         
1   X         canceled             1         
2   X         active               2         
3   Y         canceled  cancelled  0         
4   A         canceled  cancelled  0         
5   A         canceled  cancelled  1         
6   B         active               0         
7   B         canceled             1   

由于预期的o / p已更改,因此进行了编辑:

df['new_status']=(df.groupby('Customer')['Status'].
             transform(lambda x: x.duplicated(keep=False)&(x.eq('canceled').all()))
                         .map({True:'cancelled',False:''}))
print(df)

  Customer    Status new_status  duplicated
0   X         canceled             0         
1   X         canceled             1         
2   X         active               2         
3   Y         canceled             0         
4   A         canceled  cancelled  0         
5   A         canceled  cancelled  1         
6   B         active               0         
7   B         canceled             1