Pandas / Python复杂的条件和

时间:2018-10-16 15:07:15

标签: python pandas numpy

df_have

CONTRACT ID  AMT     REL_NUM HDR_NUM
1         1   0.00    0      1    
1         2   33.85   1      2    
1         3   0.72    2      2    
1         4   0.87    1      1    
1         5   1.67    1      2  

df_want

CONTRACT ID  AMT     REL_NUM HDR_NUM CALCULATION
1         1   0.00    0      1        (0.00+33.85+0.87+1.67)
1         2   33.85   1      2        (33.85+0.72)
1         3   0.72    2      2        (33.85+0.72)
1         4   0.87    1      1        (0.00+33.85+0.87+1.67)
1         5   1.67    1      2        (33.85+0.72)

df_getting

CONTRACT ID  AMT     REL_NUM HDR_NUM CALCULATION
1         1   0.00    0      1        21.75
1         2   33.85   1      2        2.00
1         3   0.72    2      2        19.75
1         4   0.87    1      1        33.85
1         5   1.67    1      2        0.00

尝试创建新列“ CALCULATION”,但逻辑有些棘手。计算应为AMT字段的总和,具体取决于CONTRACT,ID,REL_NUM和HDR_NUM字段。

第1步-检查HDR_NUM字段并获取ID = HDR_NUM并且CONTRACT字段相同的相应AMT值

第2步-在所有AMT字段中添加相同合同的REL_NUM = HDR_NUM

对于第一行,这将对ID = 2,3和CONTRACT = 1的AMT字段求和。 对于第6行,这将为ID = 2,4(对于CONTRACT = 2)的AMT字段求和

一个警告是不应重复计算(即对于第6行,将ID = 2,4的AMT字段总计为CONTRACT = 2的,请勿重复计算ID = 2)

1 个答案:

答案 0 :(得分:1)

IIUC,

def F(s):
    rule1   = s[['ID', 'AMT']].set_index('ID').to_dict()['AMT']
    rule2   = s[['REL_NUM', 'AMT']].groupby('REL_NUM').sum().to_dict()['AMT']
    s1 = s['HDR_NUM'].astype(int).map(rule1).fillna(0)
    s2 = s['HDR_NUM'].astype(int).map(rule2).fillna(0)
    return s1 + s2

df['CALCULATION'] = df.groupby('CONTRACT').apply(F).values.ravel()


    CONTRACT    ID  AMT     REL_NUM HDR_NUM CALCULATION
0   1           1   0.00    0       1       36.39
1   1           2   33.85   1       2       34.57
2   1           3   0.72    2       2       34.57
3   1           4   0.87    1       1       36.39
4   1           5   1.67    1       2       34.57