合并/加入pandas命令,以将df列中的所有共享实例标记为其他df列中的共享实例

时间:2019-06-25 21:06:37

标签: python pandas dataframe merge

我有两个数据框,一个名为“ foo”,一个名为“ bar”。我的数据框“ foo”具有一些唯一的列,而我的数据框“ bar”也具有一些唯一的列。但是,它们都共享一个列,即“ google”列。我正在尝试查看是否有一种方法可以将所有列保留在数据框1“ foo”中,并添加一个附加列“ CLRS”,如果该列中“ google”列中的内容为1 “ foo”行出现在“ bar”栏中“ google”列中的某处。

更具体地说,我们假设我的数据帧的结构如下:'foo'包含列:'foo_1','foo_2',...,'google'和bar包含列:'bar_1','bar_2 , ..., '谷歌'。 我想以这样的方式加入/合并“ foo”和“ bar”,使得“ foo”具有附加列“ CLRS”,如果“ google”在该行的“内容”中包含,则“ CLRS”具有1 foo”出现在“ bar”的“ google”列中。我尝试了以下代码:

      '''
         # foo examples
         foo['foo1'] = ['dijkstra','TSP',...]
         foo['foo2'] = ['Oculus','VR', ...]
         .
         .
         .
         foo['google'] = ['search','ads', 'A/B Testing', 'UI' ...]

         # bar examples
         bar['bar1'] = ['dijkstra','TSP',...]
         bar['bar2'] = ['search','ads', ...]
         .
         .
         .
         # 'A/B Testing' appears in the column somewhere but 'ads' does 
         # not
         bar['google'] = ['search','google_search', 'TDD', 'UI', 
         ...,'A/B Testing', ...]

         # my code
         foo_merged = 
                    foo.join(bar, how = 'left')

         # my result 
         foo_merged['foo1'] = ['dijkstra','TSP',...]
         foo_merged['foo2'] = ['search','ads', ...]
         .
         .
         .
         foo_merged['google'] = ['search','ads', ...]
         foo_merged['CLRS']   = ['search','google_search', 'TDD', 'UI', 
         ...]

         # What I want as an output for foo_merged is:
         foo_merged['foo1'] = ['dijkstra','TSP',...]
         foo_merged['foo2'] = ['search','ads', ...]
         .
         .
         .
         foo_merged['google'] = ['search','ads', 'A/B Testing', 'UI' 
         ...]
         foo_merged['CLRS']   = [1,0,1,1,...]
      '''

不幸的是,在运行上一个联接代码后,foo_merged包含foo的所有列和一个附加列,该列始终包含来自'bar'的'google'列的内容。我想要的结果将是df,如果“ foo”行中“ google”的内容作为“ bar”列中“ google”列的内容出现,则附加列“ CLRS”包含1,否则为0

1 个答案:

答案 0 :(得分:0)

我相信您正在使用 indicator = True 查找merge
指示器将标记两个数据帧中是否存在的每一行

df = pd.merge(foo, bar, how='left', on = 'google', indicator = True)
df['CLRS'] = (df['_merge'] == 'both').astype(int)    
#or df['CLRS'] = np.where(df['_merge'] == 'both', 1, 0)