假设df
就像:
pd.DataFrame({"col1": ["banana", "apple", "grapes", "banana"],
"col2": ["apple", "banana", "apple", "grapes"]})
col1 col2
banana apple
apple banana
grapes apple
banana grapes
我们如何删除反复制品,即:banana
- apple
和apple
- banana
组合?
我试过
df["col_to_check"] = df.apply(lambda x: set(x), axis=1).astype(str)
idx_to_remove = df[df.duplicated(["col_to_check"])].index
其他建议?