标记新列中数据帧之间的相似性

时间:2016-07-07 15:24:58

标签: python pandas jupyter

我想比较两个不同长度的pandas DataFrame并识别匹配的索引号。当值匹配时,我想在新列中标记这些值。

ApplicationListener

3 个答案:

答案 0 :(得分:4)

如果这些确实是索引,那么您可以在索引上使用intersection

In [61]:
df1.loc[df1.index.intersection(df2.index), 'flag'] = True
df1

Out[61]:
         Column 1  flag
Index                  
41660       Apple   NaN
41935      Banana   NaN
42100  Strawberry   NaN
42599   Pineapple  True

否则使用isin

In [63]:
df1.loc[df1['Index'].isin(df2['Index']), 'flag'] = True
df1

Out[63]:
   Index    Column 1  flag
0  41660       Apple   NaN
1  41935      Banana   NaN
2  42100  Strawberry   NaN
3  42599   Pineapple  True

答案 1 :(得分:2)

+1 @ EdChum的回答。如果您在匹配列中的价值与True不同,请尝试:

>>> df1.merge(df2,how='outer',indicator='Flag')
   Index      Column       Flag
0  41660       Apple  left_only
1  41935      Banana  left_only
2  42100  Strawberry  left_only
3  42599   Pineapple       both

答案 2 :(得分:2)

使用isin() - 方法:

import pandas as pd

df1 = pd.DataFrame(data=[
    [41660,  'Apple'],
    [41935,  'Banana'],
    [42100,  'Strawberry'],
    [42599,  'Pineapple'],
                         ]
                   , columns=['Index', 'Column 1'])

df2 = pd.DataFrame(data=[
    [42599,  'Pineapple'],
                         ]
                   , columns=['Index', 'Column 1'])

df1['Matching'] = df1['Index'].isin(df2['Index'])
print(df1)

输出:

   Index    Column 1 Matching
0  41660       Apple    False
1  41935      Banana    False
2  42100  Strawberry    False
3  42599   Pineapple     True