列中重复值之间的条件

时间:2019-03-27 11:06:47

标签: python pandas pandas-groupby loc

每个客户都有多个计划时,他们就会重复。我想将状态设置为客户:

如果他们的每个产品都填充了“ canceled_at”,则取消客户状态,但是当不是每个产品都填充了“ canceled_at”,而是至少一个产品时,状态为“降级”,因为他丢失了产品。

>
customer|canceled_at|status
x       |3/27/2018  |
x       |           |
y       |2/2/2018   |
y       |2/2/2018   |
z       |1/1/2018   |
a       |           |      

我已经处于取消状态,现在我只需要降级

df['status']=(df.groupby('customer')['canceled_at'].
  transform(lambda x: x.notna().all()).map({True:'canceled'})).fillna(df.status)
customer|canceled_at|status
x       |3/27/2018  |downgrade
x       |           |downgrade
y       |2/2/2018   |canceled
y       |2/2/2018   |canceled
z       |1/1/2018   |canceled
a       |           |      

2 个答案:

答案 0 :(得分:1)

在这里可以比较列中没有缺失值,并按Series customerGroupBy.transformGroupBy.all进行分组, GroupBy.any测试所有值True(全部不丢失)或至少一个不丢失值(所有不丢失)并将其传递给numpy.select

g = df['canceled_at'].notna().groupby(df['customer'])
m1 = g.transform('all')
m2 = g.transform('any')

df['status'] = np.select([m1, m2],['canceled','downgrade'], np.nan)
print (df)
  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x         NaN  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a         NaN        nan

或者:

df['status'] = np.select([m1, m2],['canceled','downgrade'], '')
print (df)
  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x         NaN  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a         NaN         

如果仅NaN个群组需要转换为downgrade

mask = df['canceled_at'].notna().groupby(df['customer']).transform('all')
df['status'] = np.where(mask,'canceled','downgrade')
print (df)
  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x         NaN  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a         NaN  downgrade  

答案 1 :(得分:1)

这是一种实现方法:

import pandas as pd

def select_status(canceled):
    c = canceled.count()
    if c == 0:
        status = ''
    elif c == len(canceled):
        status = 'canceled'
    else:
        status = 'downgrade'
    return pd.Series(status, index=canceled.index)

df = pd.DataFrame({'customer': ['x', 'x', 'y', 'y', 'z', 'a'],
                   'canceled_at': ['3/27/2018', None, '2/2/2018', '2/2/2018', '1/1/2018', None]})
df['status'] = df.groupby('customer')['canceled_at'].apply(select_status)
print(df)

输出:

  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x        None  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a        None