我有一个包含2列的数据集。并且有数据组合。我想查找是否没有唯一组合,并删除它们,仅保留第一行。
这是一个数据集
dim_set = [ ('Customer group$Large', 'DEPARTMENT$Sales'),
('Customer group$Medium', 'DEPARTMENT$Sales'),
('Customer group$Small', 'DEPARTMENT$Sales'),
('DEPARTMENT$Sales', 'Customer group$Large'),
('DEPARTMENT$Sales', 'Customer group$Medium'),
('DEPARTMENT$Sales', 'Customer group$Small')
]
df = pd.DataFrame(dim_set, columns=['dim', 'linked_dim'])
df
预期输出应为
答案 0 :(得分:4)
我相信您需要对每一行进行排序并删除重复项:
df = (pd.DataFrame(np.sort(df[['dim', 'linked_dim']], axis=1),
columns=['dim', 'linked_dim'])
.drop_duplicates())
print (df)
dim linked_dim
0 Customer group$Large DEPARTMENT$Sales
1 Customer group$Medium DEPARTMENT$Sales
2 Customer group$Small DEPARTMENT$Sales
答案 1 :(得分:0)
我认为它将为您服务
set @delimited = 'a,b,c';
SELECT *
FROM
JSON_TABLE(
CONCAT('["', REPLACE(@delimited, ',', '", "'), '"]'),
"$[*]"
COLUMNS(
Value varchar(50) PATH "$"
)
) data;