我有两个数据框,我想比较它们的列。基于此比较,我想创建一个新的数据框。我试过了pd.merge,加入,但是不起作用,因为它会制作其他列的多个副本。我要比较的一列与同一数据框的其他列具有一对多关系。让我告诉你:
df1:
domain_sessionid category subcategory label action
sess1 main gallery gallery_click click
sess1 main offer_desc show_more_button click
sess2 sidebar travellers babies click
sess3 main gallery gallery_click click
df2:
domain_sessionid category subcategory label action
sess1 main gallery gallery_click click
sess10 main offer_desc show_more_button click
sess20 sidebar travellers babies click
sess30 main gallery gallery_click click
resultant:
domain_sessionid category subcategory label action
sess1 main gallery gallery_click click
sess1 main offer_desc show_more_button click
正如您在生成的df中看到的那样,我只想保留会话ID匹配的那些条目以及df1中其余的值。请提出一些建议。
答案 0 :(得分:1)
您要使用.isin
:
df_both = df1[df1.domain_sessionid.isin(df2.domain_sessionid)]
print(df_both)
domain_sessionid category subcategory label action
0 sess1 main gallery gallery_click click
1 sess1 main offer_desc show_more_button click