我只想在状态为“完成”时删除重复的行。 对于其余状态(如已分配/进行中/待处理),必须保留重复的行。
Incident Status Priority ASGRP Submit Date Completed Date
Index
1 INC001 Assigned Low L1 2020-06-01 NaT
2 INC001 In progress Low L2 2020-06-01 NaT
3 INC001 completed Low L1 2020-06-01 2020-06-03
4 INC001 completed Low L1 2020-06-01 2020-06-03
5 INC001 completed Low L1 2020-06-01 2020-06-03
6 INC002 completed Medium L2 2020-06-04 2020-06-04
7 INC002 In progress Medium L1 2020-06-04 Nat
8 INC002 completed Medium L2 2020-06-01 2020-06-01
9 INC002 Pending Medium L2 2020-06-04 NaT
预期输出应如下所示,
Incident Status Priority ASGRP Submit Date Completed Date
Index
1 INC001 Assigned Low L1 2020-06-01 NaT
2 INC001 In progress Low L2 2020-06-01 NaT
3 INC001 completed Low L1 2020-06-01 2020-06-03
4 INC002 In progress Medium L1 2020-06-04 Nat
5 INC002 completed Medium L2 2020-06-01 2020-06-01
6 INC002 Pending Medium L2 2020-06-04 NaT
答案 0 :(得分:1)
这是一种方法:
首先获取没有重复的已完成行:
df1 = df.loc[df['Status'] == 'completed'].sort_values('Incident', ascending=True).drop_duplicates(['Incident'], keep='last')
获取其余行以及其他状态:
df2 = df.loc[df['Status'] != 'completed']
两者的结果:
result = pd.concat([df1,df2], ignore_index=False).sort_index()
也许有些细节没有用,例如最后一行中的ignoreindex,但是...希望它对您有用。