我如何通过多种条件在pandas中进行子集化

时间:2017-03-01 15:39:16

标签: python pandas

前:

Cat INVOICE_REF_NUMBER  OPEN_ITEM_AMOUNT(Netted Amt)    AMOUNT_ COLLECTED(Original Amt) COMPANY_CODE    OPERATING_UNIT count
invoice 0992541158  115606.38   578031.91   4380    6238   2
payment 0992541158  0          -462425.53   4380    6238   2
invoice 0090010917  1519         87803.4    2700    4315   2
payment 0090010917  0           -86284.4    2700    4315   2
invoice 0090007022  2039.55      13517      2700    4315   2

我需要单独的第5行,因为它没有任何付款,   -

2 个答案:

答案 0 :(得分:0)

首先将与同一发票相关的所有行分组。根据发票是否已付款,合并状态将有所不同:

status = df.groupby("INVOICE_REF_NUMBER")['Cat'].sum()
#INVOICE_REF_NUMBER
#0090007022           invoice
#0090010917    invoicepayment
#0992541158    invoicepayment
#Name: Cat, dtype: object

现在,使用未付款的发票提取原始行:

unpayed = df.join(status[status=='invoice'], rsuffix='_', how='right', 
                  on='INVOICE_REF_NUMBER')
#       Cat INVOICE_REF_NUMBER  OPEN_ITEM_AMOUNT(Netted Amt)     Cat_
#4  invoice         0090007022                       2039.55  invoice

如果需要,您可以删除重复的“Cat_”列:

del unpayed['Cat_']
#       Cat INVOICE_REF_NUMBER  OPEN_ITEM_AMOUNT(Netted Amt)
#4  invoice         0090007022                       2039.55

答案 1 :(得分:0)

以下是我的最大努力:

# Assume nothing has a payment
df['payment_count'] = 0

# For each invoice, count the related payments by applying
# a lambda function on each row (hence the axis=1)
df.loc[df.Cat=='invoice', 'payment_count'] =     
    df.loc[df.Cat=='invoice'].apply(lambda x: \      
    df.loc[(df['INVOICE_REF_NUMBER']==x['INVOICE_REF_NUMBER']) \
    & df.Cat=='payment')], 'Cat').count(), axis=1)

# Filter on the invoices without payments
print((df[df.Cat=='invoice') & (df.payment_count==0)])