如何根据特定条件在熊猫中删除数据框中的重复行

时间:2020-06-11 17:25:02

标签: python pandas dataframe duplicates

我只想在状态为“完成”时删除重复的行。 对于其余状态(如已分配/进行中/待处理),必须保留重复的行。

            Incident     Status Priority ASGRP Submit Date Completed Date
    Index                                                                
    1       INC001     Assigned      Low    L1  2020-06-01            NaT
    2       INC001  In progress      Low    L2  2020-06-01            NaT
    3       INC001    completed      Low    L1  2020-06-01     2020-06-03
    4       INC001    completed      Low    L1  2020-06-01     2020-06-03
    5       INC001    completed      Low    L1  2020-06-01     2020-06-03
    6       INC002    completed   Medium    L2  2020-06-04     2020-06-04
    7       INC002  In progress   Medium    L1  2020-06-04            Nat
    8       INC002    completed   Medium    L2  2020-06-01     2020-06-01
    9       INC002      Pending   Medium    L2  2020-06-04            NaT

预期输出应如下所示,

           Incident       Status Priority ASGRP Submit Date Completed Date
    Index
    1       INC001     Assigned      Low    L1  2020-06-01            NaT
    2       INC001  In progress      Low    L2  2020-06-01            NaT
    3       INC001    completed      Low    L1  2020-06-01     2020-06-03
    4       INC002  In progress   Medium    L1  2020-06-04            Nat
    5       INC002    completed   Medium    L2  2020-06-01     2020-06-01
    6       INC002      Pending   Medium    L2  2020-06-04            NaT

1 个答案:

答案 0 :(得分:1)

这是一种方法:

首先获取没有重复的已完成行:

df1 = df.loc[df['Status'] == 'completed'].sort_values('Incident', ascending=True).drop_duplicates(['Incident'], keep='last')

获取其余行以及其他状态:

df2 = df.loc[df['Status'] != 'completed']

两者的结果:

result = pd.concat([df1,df2], ignore_index=False).sort_index()

也许有些细节没有用,例如最后一行中的ignoreindex,但是...希望它对您有用。