Question

我有数据集，该数据集具有配对重复。这是我的数据

Id    antecedent           descendant
1     one                  two
2     two                  one
3     two                  three
4     one                  three
5     three                two

这就是我需要的，因为one, two等于two, one，所以我想ro删除重复的对

Id    antecedent           descendant
1     one                  two
3     two                  three
4     one                  three

Answer 1

使用numpy.sort进行每行排序，使用duplicated进行布尔掩码：

df1 = pd.DataFrame(np.sort(df[['antecedent','descendant']], axis=1))

或者：

#slowier solution
#df1 = df[['antecedent','descendant']].apply(frozenset, 1)

df = df[~df1.duplicated()]
print (df)
   Id antecedent descendant
0   1        one        two
2   3        two      three
3   4        one      three

如何删除熊猫中的配对重复？

1 个答案: