我有一列带有客户名称的列,当客户有2种产品时,该列将重复。我必须创建一个新状态,以根据情况将客户状态分组为一个。因此,我必须将客户X与另一个X进行比较,以产生一个新的状态。
Customer|Status |Cancaled_at|new status
X |Active |- |
X |Canceled|2019-xx-xx |
Y |Active |- |
Z |Active |- |
A |Canceled|- |
所需的输出:
Customer|Status |Cancaled_at|new status
X |Active |- |Canceled
X |Canceled|2019-xx-xx |Canceled
Y |Active |- |
Z |Active |- |
A |Canceled|- |
答案 0 :(得分:1)
有一种简单的方法可以找到熊猫中所有重复的值:
df['new_status'][(df.duplicated('Customer', False))] = 'Canceled'
这使new_status
列Canceled
的位置数据框的“客户”列具有重复的值。
答案 1 :(得分:0)
我认为您需要:
df = pd.DataFrame({'Customer':['X','X','Y','Z','A'], 'status':['active','canceled','active','active','canceled'],
'Canceled_at':[None, '2019-01-01', None, None,None]})
df['new_status'] = np.where((df['status']=='canceled') & (~df['Canceled_at'].isnull()), 'canceled', None)
df['new_status'] = df.groupby('Customer')['new_status'].bfill()
print(df)
输出:
Canceled_at Customer status new_status
0 None X active canceled
1 2019-01-01 X canceled canceled
2 None Y active None
3 None Z active None
4 None A canceled None
答案 2 :(得分:0)
此代码使用sort_values(),fillna()和shift():
df = df.sort_values(by=['Customer', 'Status'])
df['new_status'] = df[df.Status == 'Canceled']['Status']
df.loc[((df['Customer'] != df['Customer'].shift(-1)) & (df['new_status'].isnull())), 'new_status'] = ''
df['new_status'].fillna(method = 'backfill', limit = 1, inplace = True)
df.sort_index(inplace = True)
产生以下输出:
Customer Status Cancaled_at new_status
0 X Active - Canceled
1 X Canceled 2019-xx-xx Canceled
2 Y Active -
3 Z Active -
4 A Canceled - Canceled