假设我有一个表格测试数据:
SOID SO_Name SO_Desc PRIORITY ADE_PRIORITIZED DEPLOY_DATE ENV
123 SO1 SO1 Desc1 111 Y 01-JAN-01 0
123 SO1 SO1 Desc1 111 Y 01-JAN-01 1
123 SO1 SO1 Desc1 111 Y 01-JAN-01 2
123 SO1 SO1 Desc1 111 Y 01-JAN-01 3
987 SO1 SO1 Desc1 111 Y 01-JAN-01 0
987 SO1 SO1 Desc1 111 Y 01-JAN-16 1
987 SO1 SO1 Desc1 111 Y 21-JAN-17 2
987 SO1 SO1 Desc1 111 Y 01-JAN-17 3
121 SO121 SO121 Desc121 111 Y 01-JAN-17 0
我想删除每个soid的重复行(重复可以基于4列:so_name,so_desc,priority,ade_prioritized),保留具有最高deploy_date的行。
我使用了这个查询,但它并没有删除任何行。
delete from so_test a
where a.deploy_date < (
select max(b.deploy_date) from so_test b where a.soid = b.soid
);
0 rows deleted
我期望的最终结果应该是: SOID SO_Name SO_Desc优先级ADE_PRIORITIZED DEPLOY_DATE ENV 123 SO1 SO1 Desc1 111 Y 01-JAN-01 0 987 SO1 SO1 Desc1 111 Y 21-JAN-17 2 987 SO1 SO1 Desc1 111 Y 21-JAN-17 2
可能是什么问题? 可以在没有CTE的情况下完成吗?
答案 0 :(得分:1)
使用with (common table expression)
和row_number()
您可以识别并轻松处理重复项:
使用ctes时,只能在表达式后执行一个语句(除非您链接ctes或使用多个ctes)。
在下面的代码示例中,您将首先使用select检查输出,然后如果需要进一步操作,请注释掉select查询并取消注释删除查询。
rextester链接:http://rextester.com/UFQQ51693
with cte as (
select
*
, rn = row_number() over (
partition by soid
order by deploy_date desc
)
from [so_test]
)
/* --------------------------------------------------------------
-- This returns all of rows with values that have duplicates
-- along the row number (rn) so you can see which rows
-- would be affected by the following actions
-------------------------------------------------------------- */
/*
select o.*
from cte as o
where exists (
select 1
from cte as i
where cte.soid = i.soid
and i.rn>1
);
--*/
/* --------------------------------------------------------------
-- Remove duplicates by deleting all of the duplicates
-- where the row number (rn) is greater than 1
-- without deleting the first row of the duplicates.
-------------------------------------------------------------- */
--/*
delete
from cte
where cte.rn > 1
--*/
删除后rextester结果:
+------+---------+---------------+----------+-----------------+---------------------+-----+
| soid | so_name | so_desc | priority | ade_prioritized | deploy_date | env |
+------+---------+---------------+----------+-----------------+---------------------+-----+
| 123 | SO1 | SO1_Desc1 | 111 | Y | 01.01.2001 00:00:00 | 0 |
| 987 | SO1 | SO1_Desc1 | 111 | Y | 21.01.2017 00:00:00 | 2 |
| 121 | SO121 | SO121_Desc121 | 111 | Y | 01.01.2017 00:00:00 | 0 |
+------+---------+---------------+----------+-----------------+---------------------+-----+
答案 1 :(得分:0)
基于将非重复项保留到新表中的示例。
create table so_test_nodups
as
with dups as
( select soid, so_name, so_desc, priority, ade_prioritized, deploy_date, env,
row_number() over ( partition by so_name, so_desc, priority, ade_prioritized order by deploy_date desc ) rn
from so_test
)
select soid, so_name, so_desc, priority, ade_prioritized, deploy_date, env
from dups
where rn=1
查询so_test_nodups表。
select * from so_test_nodups
SOID SO_NAME SO_DESC PRIORITY A DEPLOY_DA ENV
---------- ---------- -------------------- ---------- - --------- ----------
123 SO1 SO1 Desc1 111 Y 01-JAN-01 0
121 SO121 SO121 Desc121 111 Y 01-JAN-17 0
在提供的修改后添加结果:
SOID SO_NAME SO_DESC PRIORITY A DEPLOY_DA ENV
---------- ---------- -------------------- ---------- - --------- ----------
987 SO1 SO1 Desc1 111 Y 21-JAN-17 2
121 SO121 SO121 Desc121 111 Y 01-JAN-17 0