我想比较两个不同长度的pandas DataFrame并识别匹配的索引号。当值匹配时,我想在新列中标记这些值。
ApplicationListener
答案 0 :(得分:4)
如果这些确实是索引,那么您可以在索引上使用intersection
:
In [61]:
df1.loc[df1.index.intersection(df2.index), 'flag'] = True
df1
Out[61]:
Column 1 flag
Index
41660 Apple NaN
41935 Banana NaN
42100 Strawberry NaN
42599 Pineapple True
否则使用isin
:
In [63]:
df1.loc[df1['Index'].isin(df2['Index']), 'flag'] = True
df1
Out[63]:
Index Column 1 flag
0 41660 Apple NaN
1 41935 Banana NaN
2 42100 Strawberry NaN
3 42599 Pineapple True
答案 1 :(得分:2)
+1 @ EdChum的回答。如果您在匹配列中的价值与True
不同,请尝试:
>>> df1.merge(df2,how='outer',indicator='Flag')
Index Column Flag
0 41660 Apple left_only
1 41935 Banana left_only
2 42100 Strawberry left_only
3 42599 Pineapple both
答案 2 :(得分:2)
使用isin() - 方法:
import pandas as pd
df1 = pd.DataFrame(data=[
[41660, 'Apple'],
[41935, 'Banana'],
[42100, 'Strawberry'],
[42599, 'Pineapple'],
]
, columns=['Index', 'Column 1'])
df2 = pd.DataFrame(data=[
[42599, 'Pineapple'],
]
, columns=['Index', 'Column 1'])
df1['Matching'] = df1['Index'].isin(df2['Index'])
print(df1)
输出:
Index Column 1 Matching
0 41660 Apple False
1 41935 Banana False
2 42100 Strawberry False
3 42599 Pineapple True