df_have
ID AMT REL_NUM HDR_NUM
3 0.02 2.0 2.0
4 2.00 2.0 4.0
5 0.00 1.0 5.0
1 0.00 5.0 1.0
2 19.7 1.0 2.0
df_want
ID AMT REL_NUM HDR_NUM CALCULATION
3 0.02 2.0 2.0 (19.7+0.02+2.00)
4 2.00 2.0 4.0 (2.00)
5 0.00 1.0 5.0 (0.00+0.00)
1 0.00 5.0 1.0 (0.00+19.7)
2 19.7 1.0 2.0 (19.7+0.02+2.00)
尝试创建新列“ CALCULATION”,但逻辑有些棘手。 计算应该是AMT字段的SUM,具体取决于ID,REL_NUM和HDR_NUM字段。
第1步-检查HDR_NUM字段并获取相应的AMT值,其中ID = HDR_NUM 步骤2-添加所有AMT字段,其中REL_NUM = HDR_NUM
对于第一行,这将对ID = 2,3和4的AMT字段求和。
需要更新的示例代码。我先尝试了一个groupby,但无法同时满足上述两个步骤:
df_want['CALCULATION']=df_have.groupby(['ID','HDR_NUM'])['AMT'].transform('sum')+ ?
答案 0 :(得分:1)
您可以使用.map
来实现。对于第二个,您需要分组以获取每个'REL_NUM'
df['num1'] = df.HDR_NUM.map(df.set_index('ID').AMT)
df['num2'] = df.HDR_NUM.map(df.groupby('REL_NUM').AMT.sum())
df['calculation'] = df.num1.add(df.num2, fill_value=0)
ID AMT REL_NUM HDR_NUM num1 num2 calculation
0 3 0.02 2.0 2.0 19.7 2.02 21.72
1 4 2.00 2.0 4.0 2.0 NaN 2.00
2 5 0.00 1.0 5.0 0.0 0.00 0.00
3 1 0.00 5.0 1.0 0.0 19.70 19.70
4 2 19.70 1.0 2.0 19.7 2.02 21.72
如果您不想因为AMT
而复制HDR_NUM == ID == REL_NUM
,则只能为groupby
求和,以免重复计数:
df['num1'] = df.HDR_NUM.map(df.set_index('ID').AMT)
df['num2'] = df.HDR_NUM.map(df[df.REL_NUM != df.ID].groupby('REL_NUM').AMT.sum())
df['calculation'] = df.num1.add(df.num2, fill_value=0)
ID AMT REL_NUM HDR_NUM num1 num2 calculation
0 2 0.02 2.0 2.0 0.02 2.0 2.02
1 4 2.00 2.0 4.0 2.00 NaN 2.00
2 5 0.00 1.0 5.0 0.00 0.0 0.00
3 1 0.00 5.0 1.0 0.00 19.7 19.70
4 3 19.70 1.0 2.0 0.02 2.0 2.02