我有两个df
,
df_1
txn creator code y_m count
WP BATCH 16 201908 17
WP BATCH 16 201909 32
FB ID2 06 201905 65
FB ID2 13 201906 77
BA TO 08 201904 99
BA TO 08 201905 76
df_2
txn user code y_m count
WP BATCH 16 201908 10
WP BATCH 16 201909 13
FB ID2 06 201905 23
FB ID2 13 201906 34
HF HUD 01 201904 9
HF HUD 01 201903 8
我想内部加入df_1
和df_2
,
df_1.merge(df_2, how='inner', left_on=['txn', 'creator', 'code', 'y_m'], right_on=['txn', 'user', 'code', 'y_m'])
并在count
和df_1
上汇总{sum)df_2
,与此同时,两个数据帧的特定行也保留在结果df
中;
df
txn creator code y_m count user
WP BATCH 16 201908 27 BATCH
WP BATCH 16 201909 45 BATCH
FB ID2 06 201905 88 ID2
FB ID2 13 201906 111 ID2
BA TO 08 201904 99 NaN
BA TO 08 201905 76 NaN
HF NaN 01 201904 9 HUD
HF NaN 01 201903 8 HUD
答案 0 :(得分:3)
我认为您需要外部联接,然后使用DataFrame.pop
用Series.add
提取列:
df = df_1.merge(df_2, how='outer',
left_on=['txn', 'creator', 'code', 'y_m'],
right_on=['txn', 'user', 'code', 'y_m'])
df['count'] = df.pop('count_x').add(df.pop('count_y'), fill_value=0)
print (df)
txn creator code y_m user count
0 WP BATCH 16 201908 BATCH 27.0
1 WP BATCH 16 201909 BATCH 45.0
2 FB ID2 6 201905 ID2 88.0
3 FB ID2 13 201906 ID2 111.0
4 BA TO 8 201904 NaN 99.0
5 BA TO 8 201905 NaN 76.0
6 HF NaN 1 201904 HUD 9.0
7 HF NaN 1 201903 HUD 8.0