Question

我有一个表，其中字符串1和2几乎是重复的-它们具有相同的值，但顺序相反。如何删除这些重复项？

+--------+-------+
| COL_1  | COL_@ |
+--------+-------+
| a1     | b1    |
| b1     | a1    | <- same as 1stline but in reversed order, needs to be removed
| a2     | b2    |   
| a3     | b3    |
| b3     | a3    |<-- also is duplicate of string above, one of these 2str need                        
+--------+-------+     to be removed

expected result:
+--------+-------+
| COL_1  | COL_@ |
+--------+-------+
| b1     | a1    | 
| a2     | b2    |   
| a3     | b3    |                   
+--------+-------+

or
+--------+-------+
| COL_1  | COL_@ |
+--------+-------+
| a1     | b1    |
| a2     | b2    |   
| a3     | b3    |                    
+--------+-------+

Answer 1

您是这样说的吗？

DELETE(
    select e.COL_@, f.COL_1
    from example as e
    join example as f on e.COL_@ = f.COL_1 and e.COL_@ < f.COL_1 )

Answer 2

如果col_1和col_ @不能为空，则可以使用LEAST和GREATEST将每个组合保留一行：

delete from tbl
where rowid not in
(
  select min(rowid) -- one rowid per combination to keep
  from tbl
  group by least(col_1, col_@), greatest(col_1, col_@)
);

Answer 3

我认为使用删除查询来实现它有点复杂。也许可以先进行这样的选择查询：

select A,B from (
                    select A,
                           B,
                           ROW_NUMBER() over (PARTITION BY ORA_HASH(A) * ORA_HASH(B) ORDER BY A) as RANK
                    FROM <your_table_name>
) where RANK = 1;

您可以使用CREATE TABLE AS SELECT ...将查询结果保存为新表然后您只需删除旧表即可。

Answer 4

这似乎并不复杂：

delete t
   where not (col1 < col2 or
              not exists (select 1
                          from t t2
                          where t2.col1 = t.col2 and
                                t2.col2 = t.col1
                         )
             );

或者：

delete t
   where col1 > col2 and
         exists (select 1
                 from t t2
                 where t2.col1 = t.col2 and
                       t2.col2 = t.col1
             );

Answer 5

一种可能的解决方案是：

DELETE FROM T
  WHERE (COL_1, "COL_@") IN (SELECT COL_1, "COL_@"
                             FROM (SELECT ROWNUM AS RN, t1.COL_1, t1."COL_@"
                                     FROM T t1
                                     INNER JOIN T t2
                                       ON t2."COL_@" = t1.COL_1 AND
                                          t2.COL_1 = t1."COL_@")
                             WHERE RN / 2 <> TRUNC(RN / 2));

请注意，COL_@在Oracle中必须用引号引起，因为@不是未加引号的标识符中的合法字符。

dbfiddle here

删除表中的几乎重复项

5 个答案: