根据其他列值的存在删除重复的行

时间:2020-03-10 15:51:55

标签: python pandas

我是python的新手,正在尝试删除空白EOB代码所在的行,其中该EOB已经存在多个Account Number代码。因此,例如,我们有这个“ 407” Account Number贡献了三行。我希望删除缺少EOB代码的行,但保留其余两行(带有EOB代码7730和3033)。

enter image description here

但是,这里的复杂性(至少对我而言)是其他Account Number从来没有EOB代码。就像下面以“ 2300”和“ 6200”结尾的帐户一样。在这些特定情况下,这些类型的帐户应在数据框中保留

enter image description here

以下是此数据集的一小部分:

data = {'Account Number': ['407','407','407','4901','4901','4901','4901','4901','6902','6902','6902','6902','8700','6900','2300','6200','2400','2400','3200','3200','3200','3200','3200','3200','3400','2200','3300','7701','7701','7701','7701','7701','7701','3100','401','401','401','6600','6600','6600','6600'],
     'Payer':['BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS'],
     'Remit Type':['IP Denied','IP Denied','IP Denied','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Denied','IP Denied','IP Denied','IP Denied','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Denied','IP Denied','IP Denied','IP Denied','IP Denied','IP Denied','IP Paid','IP Denied','IP Denied','IP Denied','IP Paid','IP Paid','IP Paid','IP Paid'],
     'EOB':['','7730','3033','5001','','9932','3035','3038','9015','5000','','9932','','','','','','','','3035','829','9932','2635','5002','','','','851','','852','9932','818','9015','','','2628','3035','5003','','3035','9932'],
     'Date':['Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10'],
     'Status':['INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS SUSPENDED', 'INPATIENT CLAIMS PAID','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID']}
df = pd.DataFrame(data,columns=['Account Number','Payer','Remit Type','EOB','Date','Status'])

3 个答案:

答案 0 :(得分:1)

在以下情况下,我将尝试确定要删除的索引:

  • 帐号至少有一行,且EOB不为空
  • EOB是空字符串

可能是:

  1. 找到相关的帐号:

    x = df[df.EOB != ''].groupby('Account Number').count()[[]]
    
  2. 删除行:

    df.drop(df.merge(x, left_on='Account Number', right_index=True).query("EOB==''").index,
            inplace=True)
    

从示例开始,它给出了:

    Account Number Payer Remit Type   EOB    Date                      Status
1              407  BCBS  IP Denied  7730  Mar 10     INPATIENT CLAIMS DENIED
2              407  BCBS  IP Denied  3033  Mar 10     INPATIENT CLAIMS DENIED
3             4901  BCBS    IP Paid  5001  Mar 10  INPATIENT CLAIMS SUSPENDED
5             4901  BCBS    IP Paid  9932  Mar 10  INPATIENT CLAIMS SUSPENDED
6             4901  BCBS    IP Paid  3035  Mar 10  INPATIENT CLAIMS SUSPENDED
7             4901  BCBS    IP Paid  3038  Mar 10  INPATIENT CLAIMS SUSPENDED
8             6902  BCBS  IP Denied  9015  Mar 10     INPATIENT CLAIMS DENIED
9             6902  BCBS  IP Denied  5000  Mar 10     INPATIENT CLAIMS DENIED
11            6902  BCBS  IP Denied  9932  Mar 10     INPATIENT CLAIMS DENIED
12            8700  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
13            6900  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
14            2300  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
15            6200  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
16            2400  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
17            2400  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
19            3200  BCBS    IP Paid  3035  Mar 10       INPATIENT CLAIMS PAID
20            3200  BCBS    IP Paid   829  Mar 10       INPATIENT CLAIMS PAID
21            3200  BCBS    IP Paid  9932  Mar 10       INPATIENT CLAIMS PAID
22            3200  BCBS    IP Paid  2635  Mar 10       INPATIENT CLAIMS PAID
23            3200  BCBS    IP Paid  5002  Mar 10       INPATIENT CLAIMS PAID
24            3400  BCBS    IP Paid        Mar 10  INPATIENT CLAIMS SUSPENDED
25            2200  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
26            3300  BCBS    IP Paid        Mar 10  INPATIENT CLAIMS SUSPENDED
27            7701  BCBS  IP Denied   851  Mar 10     INPATIENT CLAIMS DENIED
29            7701  BCBS  IP Denied   852  Mar 10     INPATIENT CLAIMS DENIED
30            7701  BCBS  IP Denied  9932  Mar 10     INPATIENT CLAIMS DENIED
31            7701  BCBS  IP Denied   818  Mar 10     INPATIENT CLAIMS DENIED
32            7701  BCBS  IP Denied  9015  Mar 10     INPATIENT CLAIMS DENIED
33            3100  BCBS    IP Paid        Mar 10  INPATIENT CLAIMS SUSPENDED
35            0401  BCBS  IP Denied  2628  Mar 10     INPATIENT CLAIMS DENIED
36            0401  BCBS  IP Denied  3035  Mar 10     INPATIENT CLAIMS DENIED
37            6600  BCBS    IP Paid  5003  Mar 10       INPATIENT CLAIMS PAID
39            6600  BCBS    IP Paid  3035  Mar 10       INPATIENT CLAIMS PAID
40            6600  BCBS    IP Paid  9932  Mar 10       INPATIENT CLAIMS PAID

答案 1 :(得分:0)

首先,您要检查与帐户关联的所有EOB是否为空。然后,您可以将它们与非空EOB结合使用:

all_empty = df['EOB'].eq('').groupby(df['Account Number']).transform('all')

df[all_empty | df['EOB'].ne('')]

答案 2 :(得分:0)

首先,个人而言,我建议不要在''中使用pandas空字符串。请改用np.nan

import numpy as np

df['EOB'] = df['EOB'].replace('', np.nan)

然后将辅助函数定义为仅在多于1行的情况下使用dropna并将其应用于基于groupby的{​​{1}}项

Account Number