我正在处理一个数据集,在该数据集中我要将用户的最后操作归因于某个目标。在此过程中,我到达以下表格集。
table
date | action_id | u_id | goal
2016-01-08 | CUID22 | 586758 | 'Goal#1'
2017-03-04 | CUID45 | 586758 | 'Goal#1'
2018-09-01 | CUID30 | 586758 | 'Goal#1'
如何在保持行到达表集下方的同时删除/替换前两个u_id或目标值。
table
date | action_id | u_id | goal
2016-01-08 | CUID22 | NaN | NaN
2017-03-04 | CUID45 | NaN | NaN
2018-09-01 | CUID30 | 586758 | 'Goal#1'
答案 0 :(得分:0)
我相信您需要duplicated
:
cols = ['u_id','goal']
df.loc[df.duplicated(cols, keep='last'), cols] = np.nan
或者:
cols = ['u_id','goal']
df[cols] = df[cols].mask(df.duplicated(cols, keep='last'))
print (df)
date action_id u_id goal
0 2016 0 NaN NaN
1 2017 1 NaN NaN
2 2018 2 1.0 1.0