Python比较系列

时间:2019-03-31 21:19:02

标签: python pandas dataframe series

我有两个数据框,我想比较它们的列。基于此比较,我想创建一个新的数据框。我试过了pd.merge,加入,但是不起作用,因为它会制作其他列的多个副本。我要比较的一列与同一数据框的其他列具有一对多关系。让我告诉你:

df1:

domain_sessionid   category   subcategory   label              action
sess1              main       gallery       gallery_click      click
sess1              main       offer_desc    show_more_button   click
sess2              sidebar    travellers    babies             click
sess3              main       gallery       gallery_click      click


df2:

domain_sessionid   category   subcategory   label              action
sess1               main       gallery       gallery_click      click
sess10              main       offer_desc    show_more_button   click
sess20              sidebar    travellers    babies             click
sess30              main       gallery       gallery_click      click


resultant:
domain_sessionid   category   subcategory   label              action
sess1              main       gallery       gallery_click      click
sess1              main       offer_desc    show_more_button   click

正如您在生成的df中看到的那样,我只想保留会话ID匹配的那些条目以及df1中其余的值。请提出一些建议。

1 个答案:

答案 0 :(得分:1)

您要使用.isin

df_both = df1[df1.domain_sessionid.isin(df2.domain_sessionid)]
print(df_both)

  domain_sessionid category subcategory             label action
0            sess1     main     gallery     gallery_click  click
1            sess1     main  offer_desc  show_more_button  click