我有两个数据框,如下所示 df1:
ID Name
1 Sachin
2 Kholi
3 Dravid
df2:
ID Run
1 20
2 60
2 10
1 5
从上面我想通过仅在df2中采用唯一ID来过滤df1:
预期输出:
ID Name
3 Dravid
我尝试了以下代码
def diff(first, second):
second = set(second)
units_in_unit_table = [item for item in first if item not in second]
return units_in_unit_table
id_df2 = diff(df2, df1)
df3 = df1[df1['ID'].isin(id_df2)]
答案 0 :(得分:1)
似乎应该简化解决方案,方法是将Series.unique
的唯一值传递给isin
,并使用~
的反转掩码:
df3 = df1[~df1['ID'].isin(df2['ID'].unique())]
或通过set
:
df3 = df1[~df1['ID'].isin(set(df2['ID']))]
print (df3)
ID Name
2 3 Dravid