前:
Cat INVOICE_REF_NUMBER OPEN_ITEM_AMOUNT(Netted Amt) AMOUNT_ COLLECTED(Original Amt) COMPANY_CODE OPERATING_UNIT count
invoice 0992541158 115606.38 578031.91 4380 6238 2
payment 0992541158 0 -462425.53 4380 6238 2
invoice 0090010917 1519 87803.4 2700 4315 2
payment 0090010917 0 -86284.4 2700 4315 2
invoice 0090007022 2039.55 13517 2700 4315 2
我需要单独的第5行,因为它没有任何付款, -
答案 0 :(得分:0)
首先将与同一发票相关的所有行分组。根据发票是否已付款,合并状态将有所不同:
status = df.groupby("INVOICE_REF_NUMBER")['Cat'].sum()
#INVOICE_REF_NUMBER
#0090007022 invoice
#0090010917 invoicepayment
#0992541158 invoicepayment
#Name: Cat, dtype: object
现在,使用未付款的发票提取原始行:
unpayed = df.join(status[status=='invoice'], rsuffix='_', how='right',
on='INVOICE_REF_NUMBER')
# Cat INVOICE_REF_NUMBER OPEN_ITEM_AMOUNT(Netted Amt) Cat_
#4 invoice 0090007022 2039.55 invoice
如果需要,您可以删除重复的“Cat_”列:
del unpayed['Cat_']
# Cat INVOICE_REF_NUMBER OPEN_ITEM_AMOUNT(Netted Amt)
#4 invoice 0090007022 2039.55
答案 1 :(得分:0)
以下是我的最大努力:
# Assume nothing has a payment
df['payment_count'] = 0
# For each invoice, count the related payments by applying
# a lambda function on each row (hence the axis=1)
df.loc[df.Cat=='invoice', 'payment_count'] =
df.loc[df.Cat=='invoice'].apply(lambda x: \
df.loc[(df['INVOICE_REF_NUMBER']==x['INVOICE_REF_NUMBER']) \
& df.Cat=='payment')], 'Cat').count(), axis=1)
# Filter on the invoices without payments
print((df[df.Cat=='invoice') & (df.payment_count==0)])