熊猫内部连接两个数据框并汇总列值

时间:2019-06-20 10:51:57

标签: python python-3.x pandas dataframe merge

我有两个df

df_1
txn    creator    code    y_m        count
WP     BATCH      16      201908     17
WP     BATCH      16      201909     32
FB     ID2        06      201905     65
FB     ID2        13      201906     77
BA     TO         08      201904     99
BA     TO         08      201905     76

df_2
txn    user       code    y_m        count
WP     BATCH      16      201908     10
WP     BATCH      16      201909     13
FB     ID2        06      201905     23
FB     ID2        13      201906     34
HF     HUD        01      201904     9
HF     HUD        01      201903     8

我想内部加入df_1df_2

df_1.merge(df_2, how='inner', left_on=['txn', 'creator', 'code', 'y_m'], right_on=['txn', 'user', 'code', 'y_m'])

并在countdf_1上汇总{sum)df_2,与此同时,两个数据帧的特定行也保留在结果df中;

df
txn    creator    code    y_m        count    user
WP     BATCH      16      201908     27       BATCH
WP     BATCH      16      201909     45       BATCH
FB     ID2        06      201905     88       ID2
FB     ID2        13      201906     111      ID2
BA     TO         08      201904     99       NaN
BA     TO         08      201905     76       NaN
HF     NaN        01      201904     9        HUD 
HF     NaN        01      201903     8        HUD

1 个答案:

答案 0 :(得分:3)

我认为您需要外部联接,然后使用DataFrame.popSeries.add提取列:

df = df_1.merge(df_2, how='outer', 
                      left_on=['txn', 'creator', 'code', 'y_m'], 
                      right_on=['txn', 'user', 'code', 'y_m'])
df['count'] = df.pop('count_x').add(df.pop('count_y'), fill_value=0)
print (df)
  txn creator  code     y_m   user  count
0  WP   BATCH    16  201908  BATCH   27.0
1  WP   BATCH    16  201909  BATCH   45.0
2  FB     ID2     6  201905    ID2   88.0
3  FB     ID2    13  201906    ID2  111.0
4  BA      TO     8  201904    NaN   99.0
5  BA      TO     8  201905    NaN   76.0
6  HF     NaN     1  201904    HUD    9.0
7  HF     NaN     1  201903    HUD    8.0