如何找到2列的唯一组合,不删除唯一组合,仅在熊猫中保留第一行

时间:2019-06-14 11:28:17

标签: python pandas

我有一个包含2列的数据集。并且有数据组合。我想查找是否没有唯一组合,并删除它们,仅保留第一行。

这是一个数据集

dim_set = [ ('Customer group$Large', 'DEPARTMENT$Sales'),
        ('Customer group$Medium', 'DEPARTMENT$Sales'),
        ('Customer group$Small', 'DEPARTMENT$Sales'),
        ('DEPARTMENT$Sales', 'Customer group$Large'),
        ('DEPARTMENT$Sales', 'Customer group$Medium'),
        ('DEPARTMENT$Sales', 'Customer group$Small')
        ]
df = pd.DataFrame(dim_set, columns=['dim', 'linked_dim'])
df

enter image description here

预期输出应为

enter image description here

2 个答案:

答案 0 :(得分:4)

我相信您需要对每一行进行排序并删除重复项:

df = (pd.DataFrame(np.sort(df[['dim', 'linked_dim']], axis=1),
                   columns=['dim', 'linked_dim'])
        .drop_duplicates())
print (df)
                     dim        linked_dim
0   Customer group$Large  DEPARTMENT$Sales
1  Customer group$Medium  DEPARTMENT$Sales
2   Customer group$Small  DEPARTMENT$Sales

答案 1 :(得分:0)

我认为它将为您服务

set @delimited = 'a,b,c';

SELECT *
     FROM
       JSON_TABLE(
         CONCAT('["', REPLACE(@delimited, ',', '", "'), '"]'),
         "$[*]"
         COLUMNS(
           Value varchar(50) PATH "$"
         )
       ) data;