我认为cost_x应该用head_y替换，以其他方式没有重复

Question

我要删除“ head_x”和“ head_y”列以及“ cost_x”和“ cost_y”列的重复项。

这是我的代码：

df=df.astype(str)

df.drop_duplicates(subset={'head_x','head_y'}, keep=False, inplace=True)

df.drop_duplicates(subset={'cost_x','cost_y'}, keep=False, inplace=True)

print(df)

这是the output dataframe，因为您可以看到第一行在两个子集上都是重复的。那么为什么这行仍然在那里？

我不仅要删除第一行，而且要删除所有重复项。 Tis is another output对于索引/节点6也是重复的。

Answer 1

df=df.astype(str)

df = df.drop_duplicates(subset={'head_x','head_y'}, keep=False, inplace=True)

df = df.drop_duplicates(subset={'cost_x','cost_y'}, keep=False, inplace=True)

我认为cost_x应该用head_y替换，以其他方式没有重复

Answer 2

看看前两行：

background-image

从 head_x 和 head_y 开始：

是 2 和 2 ，
是 2 和 3 ，

所以这两对不同。

然后查看 cost_x 和 cost_y ：

是 6 和 3 ，
是 6 和 4 ，

所以这两对也不同。

结论：考虑到两列，这2行是不重复项子集。

python为什么不删除所有重复项？

2 个答案:

我认为cost_x应该用head_y替换，以其他方式没有重复