Question

在Python中，使用了Pandas数据框：

dataframe_1：

Add

dataframe_2：

     id
0  AB17
1  AB18
2  AB19
3  AB20
4  AB10

在这里，dataframe_2包含与dataframe_1相同的 AB20 ， AB10 和 AB17 和 AB17 。

如何检查dataframe_2中的哪些元素是新元素，哪些与相同，如dataframe_1？

Answer 1

我认为需要isin作为布尔掩码，并用loc用boolean indexing进行过滤，如有必要，将输出Series转换为list：

mask = dataframe_2['id'].isin(dataframe_1['id'])
print (mask)
0     True
1     True
2     True
3    False
4    False
Name: id, dtype: bool

same = dataframe_2.loc[mask, 'id'].tolist()
diff = dataframe_2.loc[~mask, 'id'].tolist()

#if want unique values
#same = dataframe_2.loc[mask, 'id'].unique().tolist()
#diff = dataframe_2.loc[~mask, 'id'].unique().tolist()

print (same)
['AB20', 'AB10', 'AB17']

print (diff)
['AB21', 'AB09']

Answer 2

将isin用作：

df2.id.isin(df1.id)

0     True
1     True
2     True
3    False
4    False
Name: id, dtype: bool

Pandas Dataframe数据是相同还是新的？

2 个答案: