这个问题类似于Pandas: remove duplicates that exist in any order,但我有多对重复项。
例如:
我有数据,其中A列对应于C列,B列对应于D列。
import pandas as pd
# Initial data frame
data = pd.DataFrame({'A': [0, 0, 50, 21, 50, 35, 5, 50],
'B': [50, 22, 35, 0, 10, 50, 21, 0],
'C': ["a", "a", "y", "x", "y", "w", "z", "y"],
'D': ["y", "c", "w", "a", "b", "y", "x", "a"]})
data
# A B C D
#0 0 50 a y
#1 0 22 a c
#2 50 35 y w
#3 21 0 x a
#4 50 10 y b
#5 35 50 w y
#6 5 21 z x
#7 50 0 y a
我想删除A和B列中存在的重复项,但是我需要保留它们在C和D列中的对应字母值。实际上,我的数据集要大得多。输出应如下所示:
# A B C D
#0 0 50 a y
#1 0 22 a c
#2 50 35 y w
#3 21 0 x a
#4 50 10 y b
#5 5 21 z x