Question

我有兴趣从我的表中删除重复的行。在重复中我指的是两行或更多行，所有行都包含所有列的相同值。网上有多种方法，所有最终都涉及加入或分组，而所有感兴趣的列应该明确命名。这并不难，但需要为每个新表写入列，如果涉及数百列，可能会很乏味。

这是一个天真的问题：union会删除重复的行吗？通用查询可以是：

create table new_table as
(
select * from old_table
union
select * from old_table
)

，如下例所示：

with tmp as
(
select 1 as a, 2 as b from dual
union all
select 1 as a, 2 as b from dual
)
select * from tmp
union
select * from tmp

A B
- - 
1 2

这似乎是一个简单的解决方案，它不需要任何配置，只需要表名。

我错过了什么？或者这完全有效？

谢谢

Answer 1

您的方法有效且有效。我觉得更好的方法是使用DISTINCT关键字来实现此目的。

select * from test;
A   B
------
1   2
1   2
1   2
1   2
1   2

   select distinct * from test
   A    B
  ------- 
   1    2

Answer 2

你的方法是正确的，但效率不高，而且（恕我直言）过于复杂。

我用如下表格做了一个简单的测试：

No Json serializer found for type Any. Try to implement an implicit Writes or 
Format for this type. 
[error]           "stdDev" -> stdDev

获得create table old_table(a number, b number); begin for i in 1..5 loop insert into old_table select level, -level from dual connect by level <= 1000000; commit; end loop; end;：

UNION

您可以尝试通过在表格的第二次扫描中应用过滤器来简化create table new_table as ( select * from old_table UNION select * from old_table ); -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | CREATE TABLE STATEMENT | | 9934K| 246M| | 86405 (51)| 00:17:17 | | 1 | LOAD AS SELECT | NEW_TABLE | | | | | | | 2 | SORT UNIQUE | | 9934K| 246M| 343M| 86405 (51)| 00:17:17 | | 3 | UNION-ALL | | | | | | | | 4 | TABLE ACCESS FULL | OLD_TABLE | 4967K| 123M| | 3039 (1)| 00:00:37 | | 5 | TABLE ACCESS FULL | OLD_TABLE | 4967K| 123M| | 3039 (1)| 00:00:37 | --------------------------------------------------------------------------------------------：

UNION

这就是create table new_table1 as ( select * from old_table UNION select * from old_table where null is not null ); -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | CREATE TABLE STATEMENT | | 4967K| 123M| | 43291 (1)| 00:08:40 | | 1 | LOAD AS SELECT | NEW_TABLE | | | | | | | 2 | SORT UNIQUE | | 4967K| 123M| 171M| 43291 (1)| 00:08:40 | | 3 | UNION-ALL | | | | | | | | 4 | TABLE ACCESS FULL | OLD_TABLE | 4967K| 123M| | 3039 (1)| 00:00:37 | |* 5 | FILTER | | | | | | | | 6 | TABLE ACCESS FULL | OLD_TABLE | 4967K| 123M| | 3039 (1)| 00:00:37 | --------------------------------------------------------------------------------------------的作用：

DISTINCT

无论方法如何，结果都是一样的，不需要显式写入所有列名：

create table new_table2 as
(
    select distinct * from old_table
);
---------------------------------------------------------------------------------------------
| Id  | Operation              | Name       | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT |            |  4967K|   123M|       | 43203   (1)| 00:08:39 |
|   1 |  LOAD AS SELECT        | NEW_TABLE2 |       |       |       |            |          |
|   2 |   HASH UNIQUE          |            |  4967K|   123M|   171M| 39759   (1)| 00:07:58 |
|   3 |    TABLE ACCESS FULL   | OLD_TABLE  |  4967K|   123M|       |  3039   (1)| 00:00:37 |
---------------------------------------------------------------------------------------------

Oracle - 使用union删除重复的行 - 这真的有效吗？

2 个答案: