如何基于2列的比较合并2 df以匹配1列

时间:2019-11-06 23:09:07

标签: python pandas merge

如何合并2个df,1列以匹配2列??

  • 目标是合并2个df,以将每个活动ID从REF表到ID的数据的记录计数。
  • 问题.merge仅将1列与1列进行比较

数据混乱了,对于某些行,有id名称而不是id。

如果我想将1列合并为1列,或将2列合并为2列,而不是将1列合并为2列,则可以使用

Reff表

g_spend =

campaignid   id_name      cost

154          campaign1    15
155          campaign2    12
1566         campaign33   12
158          campaign4    33

数据

cw = 

campaignid

154
154
155
campaign1    
campaign33
1566
158
campaign1    
campaign1    
campaign33
campaign4

所需的输出



g_spend =

campaignid  id_name      cost    leads

154        campaign1    15       5
155        campaign2    12       0
1566       campaign33   12       3
158        campaign4    33       2

我做了什么。

# Just work for one column

cw.head()
grouped_cw = cw.groupby(["campaignid"]).count()
grouped_cw.rename(columns={'reach':'leads'}, inplace=True)

grouped_cw = pd.DataFrame(grouped_cw)


# now merging
g_spend.campaignid = g_spend.campaignid.astype(str)

g_spend = g_spend.merge(grouped_cw, left_on='campaignid', right_index=True)

1 个答案:

答案 0 :(得分:1)

我首先将id_name设置为g_spend的索引,然后对replace进行cw,然后再进行value_counts

s = (cw.campaignid
       .replace(g_spend.set_index('id_name').campaignid
       .value_counts()
       .to_frame('leads')
    )

g_spend = g_spend.merge(s, left_on='campaignid', right_index=True)

输出:

  campaignid     id_name  cost  leads
0        154   campaign1    15      5
1        155   campaign2    12      1
2       1566  campaign33    12      3
3        158   campaign4    33      2