如何在不删除行的情况下保留熊猫的最后一个值

时间:2018-10-23 07:37:42

标签: pandas

我正在处理一个数据集,在该数据集中我要将用户的最后操作归因于某个目标。在此过程中,我到达以下表格集。

table

date         |  action_id   |  u_id       | goal   
2016-01-08   |  CUID22      |   586758    |  'Goal#1'
2017-03-04   |  CUID45      |   586758    |  'Goal#1'
2018-09-01   |  CUID30      |   586758    |  'Goal#1'

如何在保持行到达表集下方的同时删除/替换前两个u_id或目标值。

table

date         |  action_id   |  u_id       | goal   
2016-01-08   |  CUID22      |   NaN       |  NaN
2017-03-04   |  CUID45      |   NaN       |  NaN
2018-09-01   |  CUID30      |   586758    |  'Goal#1'

1 个答案:

答案 0 :(得分:0)

我相信您需要duplicated

cols = ['u_id','goal']
df.loc[df.duplicated(cols, keep='last'), cols] = np.nan

或者:

cols = ['u_id','goal']
df[cols] = df[cols].mask(df.duplicated(cols, keep='last'))

print (df)
   date  action_id  u_id  goal
0  2016          0   NaN   NaN
1  2017          1   NaN   NaN
2  2018          2   1.0   1.0