我是python的新手,正在尝试删除空白EOB
代码所在的行,其中该EOB
已经存在多个Account Number
代码。因此,例如,我们有这个“ 407” Account Number
贡献了三行。我希望删除缺少EOB
代码的行,但保留其余两行(带有EOB代码7730和3033)。
但是,这里的复杂性(至少对我而言)是其他Account Number
从来没有EOB
代码。就像下面以“ 2300”和“ 6200”结尾的帐户一样。在这些特定情况下,这些类型的帐户应在数据框中保留。
以下是此数据集的一小部分:
data = {'Account Number': ['407','407','407','4901','4901','4901','4901','4901','6902','6902','6902','6902','8700','6900','2300','6200','2400','2400','3200','3200','3200','3200','3200','3200','3400','2200','3300','7701','7701','7701','7701','7701','7701','3100','401','401','401','6600','6600','6600','6600'],
'Payer':['BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS'],
'Remit Type':['IP Denied','IP Denied','IP Denied','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Denied','IP Denied','IP Denied','IP Denied','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Denied','IP Denied','IP Denied','IP Denied','IP Denied','IP Denied','IP Paid','IP Denied','IP Denied','IP Denied','IP Paid','IP Paid','IP Paid','IP Paid'],
'EOB':['','7730','3033','5001','','9932','3035','3038','9015','5000','','9932','','','','','','','','3035','829','9932','2635','5002','','','','851','','852','9932','818','9015','','','2628','3035','5003','','3035','9932'],
'Date':['Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10'],
'Status':['INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS SUSPENDED', 'INPATIENT CLAIMS PAID','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID']}
df = pd.DataFrame(data,columns=['Account Number','Payer','Remit Type','EOB','Date','Status'])
答案 0 :(得分:1)
在以下情况下,我将尝试确定要删除的索引:
可能是:
找到相关的帐号:
x = df[df.EOB != ''].groupby('Account Number').count()[[]]
删除行:
df.drop(df.merge(x, left_on='Account Number', right_index=True).query("EOB==''").index,
inplace=True)
从示例开始,它给出了:
Account Number Payer Remit Type EOB Date Status
1 407 BCBS IP Denied 7730 Mar 10 INPATIENT CLAIMS DENIED
2 407 BCBS IP Denied 3033 Mar 10 INPATIENT CLAIMS DENIED
3 4901 BCBS IP Paid 5001 Mar 10 INPATIENT CLAIMS SUSPENDED
5 4901 BCBS IP Paid 9932 Mar 10 INPATIENT CLAIMS SUSPENDED
6 4901 BCBS IP Paid 3035 Mar 10 INPATIENT CLAIMS SUSPENDED
7 4901 BCBS IP Paid 3038 Mar 10 INPATIENT CLAIMS SUSPENDED
8 6902 BCBS IP Denied 9015 Mar 10 INPATIENT CLAIMS DENIED
9 6902 BCBS IP Denied 5000 Mar 10 INPATIENT CLAIMS DENIED
11 6902 BCBS IP Denied 9932 Mar 10 INPATIENT CLAIMS DENIED
12 8700 BCBS IP Paid Mar 10 INPATIENT CLAIMS PAID
13 6900 BCBS IP Paid Mar 10 INPATIENT CLAIMS PAID
14 2300 BCBS IP Paid Mar 10 INPATIENT CLAIMS PAID
15 6200 BCBS IP Paid Mar 10 INPATIENT CLAIMS PAID
16 2400 BCBS IP Paid Mar 10 INPATIENT CLAIMS PAID
17 2400 BCBS IP Paid Mar 10 INPATIENT CLAIMS PAID
19 3200 BCBS IP Paid 3035 Mar 10 INPATIENT CLAIMS PAID
20 3200 BCBS IP Paid 829 Mar 10 INPATIENT CLAIMS PAID
21 3200 BCBS IP Paid 9932 Mar 10 INPATIENT CLAIMS PAID
22 3200 BCBS IP Paid 2635 Mar 10 INPATIENT CLAIMS PAID
23 3200 BCBS IP Paid 5002 Mar 10 INPATIENT CLAIMS PAID
24 3400 BCBS IP Paid Mar 10 INPATIENT CLAIMS SUSPENDED
25 2200 BCBS IP Paid Mar 10 INPATIENT CLAIMS PAID
26 3300 BCBS IP Paid Mar 10 INPATIENT CLAIMS SUSPENDED
27 7701 BCBS IP Denied 851 Mar 10 INPATIENT CLAIMS DENIED
29 7701 BCBS IP Denied 852 Mar 10 INPATIENT CLAIMS DENIED
30 7701 BCBS IP Denied 9932 Mar 10 INPATIENT CLAIMS DENIED
31 7701 BCBS IP Denied 818 Mar 10 INPATIENT CLAIMS DENIED
32 7701 BCBS IP Denied 9015 Mar 10 INPATIENT CLAIMS DENIED
33 3100 BCBS IP Paid Mar 10 INPATIENT CLAIMS SUSPENDED
35 0401 BCBS IP Denied 2628 Mar 10 INPATIENT CLAIMS DENIED
36 0401 BCBS IP Denied 3035 Mar 10 INPATIENT CLAIMS DENIED
37 6600 BCBS IP Paid 5003 Mar 10 INPATIENT CLAIMS PAID
39 6600 BCBS IP Paid 3035 Mar 10 INPATIENT CLAIMS PAID
40 6600 BCBS IP Paid 9932 Mar 10 INPATIENT CLAIMS PAID
答案 1 :(得分:0)
首先,您要检查与帐户关联的所有EOB
是否为空。然后,您可以将它们与非空EOB
结合使用:
all_empty = df['EOB'].eq('').groupby(df['Account Number']).transform('all')
df[all_empty | df['EOB'].ne('')]
答案 2 :(得分:0)
首先,个人而言,我建议不要在''
中使用pandas
空字符串。请改用np.nan
:
import numpy as np
df['EOB'] = df['EOB'].replace('', np.nan)
然后将辅助函数定义为仅在多于1行的情况下使用dropna
并将其应用于基于groupby
的{{1}}项
Account Number