df_have
CONTRACT ID AMT REL_NUM HDR_NUM
1 1 0.00 0 1
1 2 33.85 1 2
1 3 0.72 2 2
1 4 0.87 1 1
1 5 1.67 1 2
df_want
CONTRACT ID AMT REL_NUM HDR_NUM CALCULATION
1 1 0.00 0 1 (0.00+33.85+0.87+1.67)
1 2 33.85 1 2 (33.85+0.72)
1 3 0.72 2 2 (33.85+0.72)
1 4 0.87 1 1 (0.00+33.85+0.87+1.67)
1 5 1.67 1 2 (33.85+0.72)
df_getting
CONTRACT ID AMT REL_NUM HDR_NUM CALCULATION
1 1 0.00 0 1 21.75
1 2 33.85 1 2 2.00
1 3 0.72 2 2 19.75
1 4 0.87 1 1 33.85
1 5 1.67 1 2 0.00
尝试创建新列“ CALCULATION”,但逻辑有些棘手。计算应为AMT字段的总和,具体取决于CONTRACT,ID,REL_NUM和HDR_NUM字段。
第1步-检查HDR_NUM字段并获取ID = HDR_NUM并且CONTRACT字段相同的相应AMT值
第2步-在所有AMT字段中添加相同合同的REL_NUM = HDR_NUM
对于第一行,这将对ID = 2,3和CONTRACT = 1的AMT字段求和。 对于第6行,这将为ID = 2,4(对于CONTRACT = 2)的AMT字段求和
一个警告是不应重复计算(即对于第6行,将ID = 2,4的AMT字段总计为CONTRACT = 2的,请勿重复计算ID = 2)
答案 0 :(得分:1)
IIUC,
def F(s):
rule1 = s[['ID', 'AMT']].set_index('ID').to_dict()['AMT']
rule2 = s[['REL_NUM', 'AMT']].groupby('REL_NUM').sum().to_dict()['AMT']
s1 = s['HDR_NUM'].astype(int).map(rule1).fillna(0)
s2 = s['HDR_NUM'].astype(int).map(rule2).fillna(0)
return s1 + s2
df['CALCULATION'] = df.groupby('CONTRACT').apply(F).values.ravel()
CONTRACT ID AMT REL_NUM HDR_NUM CALCULATION
0 1 1 0.00 0 1 36.39
1 1 2 33.85 1 2 34.57
2 1 3 0.72 2 2 34.57
3 1 4 0.87 1 1 36.39
4 1 5 1.67 1 2 34.57