我有兴趣从我的表中删除重复的行。 在重复中我指的是两行或更多行,所有行都包含所有列的相同值。 网上有多种方法, 所有最终都涉及加入或分组,而所有感兴趣的列应该明确命名。 这并不难,但需要为每个新表写入列, 如果涉及数百列,可能会很乏味。
这是一个天真的问题:union会删除重复的行吗? 通用查询可以是:
create table new_table as
(
select * from old_table
union
select * from old_table
)
,如下例所示:
with tmp as
(
select 1 as a, 2 as b from dual
union all
select 1 as a, 2 as b from dual
)
select * from tmp
union
select * from tmp
A B
- -
1 2
这似乎是一个简单的解决方案,它不需要任何配置,只需要表名。
我错过了什么?或者这完全有效?谢谢
答案 0 :(得分:2)
您的方法有效且有效。我觉得更好的方法是使用DISTINCT
关键字来实现此目的。
select * from test;
A B
------
1 2
1 2
1 2
1 2
1 2
select distinct * from test
A B
-------
1 2
答案 1 :(得分:2)
你的方法是正确的,但效率不高,而且(恕我直言)过于复杂。
我用如下表格做了一个简单的测试:
No Json serializer found for type Any. Try to implement an implicit Writes or
Format for this type.
[error] "stdDev" -> stdDev
获得create table old_table(a number, b number);
begin
for i in 1..5 loop
insert into old_table select level, -level from dual connect by level <= 1000000;
commit;
end loop;
end;
:
UNION
您可以尝试通过在表格的第二次扫描中应用过滤器来简化create table new_table as
(
select * from old_table
UNION
select * from old_table
);
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | 9934K| 246M| | 86405 (51)| 00:17:17 |
| 1 | LOAD AS SELECT | NEW_TABLE | | | | | |
| 2 | SORT UNIQUE | | 9934K| 246M| 343M| 86405 (51)| 00:17:17 |
| 3 | UNION-ALL | | | | | | |
| 4 | TABLE ACCESS FULL | OLD_TABLE | 4967K| 123M| | 3039 (1)| 00:00:37 |
| 5 | TABLE ACCESS FULL | OLD_TABLE | 4967K| 123M| | 3039 (1)| 00:00:37 |
--------------------------------------------------------------------------------------------
:
UNION
这就是create table new_table1 as
(
select * from old_table
UNION
select * from old_table where null is not null
);
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | 4967K| 123M| | 43291 (1)| 00:08:40 |
| 1 | LOAD AS SELECT | NEW_TABLE | | | | | |
| 2 | SORT UNIQUE | | 4967K| 123M| 171M| 43291 (1)| 00:08:40 |
| 3 | UNION-ALL | | | | | | |
| 4 | TABLE ACCESS FULL | OLD_TABLE | 4967K| 123M| | 3039 (1)| 00:00:37 |
|* 5 | FILTER | | | | | | |
| 6 | TABLE ACCESS FULL | OLD_TABLE | 4967K| 123M| | 3039 (1)| 00:00:37 |
--------------------------------------------------------------------------------------------
的作用:
DISTINCT
无论方法如何,结果都是一样的,不需要显式写入所有列名:
create table new_table2 as
(
select distinct * from old_table
);
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | 4967K| 123M| | 43203 (1)| 00:08:39 |
| 1 | LOAD AS SELECT | NEW_TABLE2 | | | | | |
| 2 | HASH UNIQUE | | 4967K| 123M| 171M| 39759 (1)| 00:07:58 |
| 3 | TABLE ACCESS FULL | OLD_TABLE | 4967K| 123M| | 3039 (1)| 00:00:37 |
---------------------------------------------------------------------------------------------