我有一张这样的桌子。
|-DT--------- |-ID------|
|5/30 12:00pm |10 |
|5/30 01:00pm |30 |
|5/30 02:30pm |30 |
|5/30 03:00pm |50 |
|5/30 04:30pm |10 |
|5/30 05:00pm |10 |
|5/30 06:30pm |10 |
|5/30 07:30pm |10 |
|5/30 08:00pm |50 |
|5/30 09:30pm |10 |
仅当前一行与下一行具有相同的ID时,我才想删除任何重复的行。我希望将重复的行保留在日期时间最远的日期。例如,上表就是这样的。
|-DT--------- |-ID------|
|5/30 12:00pm |10 |
|5/30 02:30pm |30 |
|5/30 03:00pm |50 |
|5/30 07:30pm |10 |
|5/30 08:00pm |50 |
|5/30 09:30pm |10 |
我可以获得有关如何做到这一点的任何提示吗?
答案 0 :(得分:3)
with C as
(
select ID,
row_number() over(order by DT) as rn
from YourTable
)
delete C1
from C as C1
inner join C as C2
on C1.rn = C2.rn-1 and
C1.ID = C2.ID
答案 1 :(得分:2)
执行以下3个步骤:http://www.sqlfiddle.com/#!3/b58b9/19
首先使行顺序:
with a as
(
select dt, id, row_number() over(order by dt) as rn
from tbl
)
select * from a;
输出:
| DT | ID | RN |
----------------------------------------
| May, 30 2012 12:00:00-0700 | 10 | 1 |
| May, 30 2012 13:00:00-0700 | 30 | 2 |
| May, 30 2012 14:30:00-0700 | 30 | 3 |
| May, 30 2012 15:00:00-0700 | 50 | 4 |
| May, 30 2012 16:30:00-0700 | 10 | 5 |
| May, 30 2012 17:00:00-0700 | 10 | 6 |
| May, 30 2012 18:30:00-0700 | 10 | 7 |
| May, 30 2012 19:30:00-0700 | 10 | 8 |
| May, 30 2012 20:00:00-0700 | 50 | 9 |
| May, 30 2012 21:30:00-0700 | 10 | 10 |
其次,使用序号,我们可以找到底部的行(以及那些不在底部的行):
with a as
(
select dt, id, row_number() over(order by dt) as rn
from tbl
)
select below.*,
case when above.id <> below.id or above.id is null then
1
else
0
end as is_at_bottom
from a below
left join a above on above.rn + 1 = below.rn;
输出:
| DT | ID | RN | IS_AT_BOTTOM |
-------------------------------------------------------
| May, 30 2012 12:00:00-0700 | 10 | 1 | 1 |
| May, 30 2012 13:00:00-0700 | 30 | 2 | 1 |
| May, 30 2012 14:30:00-0700 | 30 | 3 | 0 |
| May, 30 2012 15:00:00-0700 | 50 | 4 | 1 |
| May, 30 2012 16:30:00-0700 | 10 | 5 | 1 |
| May, 30 2012 17:00:00-0700 | 10 | 6 | 0 |
| May, 30 2012 18:30:00-0700 | 10 | 7 | 0 |
| May, 30 2012 19:30:00-0700 | 10 | 8 | 0 |
| May, 30 2012 20:00:00-0700 | 50 | 9 | 1 |
| May, 30 2012 21:30:00-0700 | 10 | 10 | 1 |
第三,删除不在底部的所有行:
with a as
(
select dt, id, row_number() over(order by dt) as rn
from tbl
)
,b as
(
select below.*,
case when above.id <> below.id or above.id is null then
1
else
0
end as is_at_bottom
from a below
left join a above on above.rn + 1 = below.rn
)
delete a
from a
inner join b on b.rn = a.rn
where b.is_at_bottom = 0;
验证:
select * from tbl order by dt;
输出:
| DT | ID |
-----------------------------------
| May, 30 2012 12:00:00-0700 | 10 |
| May, 30 2012 13:00:00-0700 | 30 |
| May, 30 2012 15:00:00-0700 | 50 |
| May, 30 2012 16:30:00-0700 | 10 |
| May, 30 2012 20:00:00-0700 | 50 |
| May, 30 2012 21:30:00-0700 | 10 |
您还可以将删除简化为:http://www.sqlfiddle.com/#!3/b58b9/20
with a as
(
select dt, id, row_number() over(order by dt, id) as rn
from tbl
)
delete above
from a below
left join a above on above.rn + 1 = below.rn
where case when above.id <> below.id or above.id is null then 1 else 0 end = 0;
Mikael Eriksson的答案是最好的,如果我再次简化我的简化查询,它看起来像他的答案ツ为此,我给他的答案+1。我会让他的查询更具可读性;通过交换加入顺序并给出好的别名。
with a as
(
select *, row_number() over(order by dt, id) as rn
from tbl
)
delete above
from a below
join a above on above.rn + 1 = below.rn and above.id = below.id;
答案 2 :(得分:0)
在这里,您只需将[表格]替换为您的表格名称。
SELECT *
FROM [dbo].[Table]
WHERE [Ident] NOT IN
(
SELECT Extent.[Ident]
FROM
(
SELECT TOP 100 PERCENT T1.[DT],
T1.[ID],
T1.[Ident],
(
SELECT TOP 1 Previous.ID
FROM [dbo].[Table] AS Previous
WHERE Previous.[Ident] > T1.Ident -- this is where the identity seed is important
ORDER BY [Ident] ASC
) AS 'PreviousId'
FROM [dbo].[Table] AS T1
ORDER BY T1.[Ident] DESC
) AS Extent
WHERE [Id] = [PreviousId]
)
注意:您需要在表格上使用一个缩进列 - 如果您无法更改表格的结构,请使用CTE。
答案 3 :(得分:0)
您可以尝试按照查询...
select * from
(
select *,RANK() OVER (ORDER BY dt,id) AS Rank from test
) as a
where 0 = (
select count(id) from (
select id, RANK() OVER (ORDER BY dt,id) AS Rank from test
)as b where b.id = a.id and b.Rank = a.Rank + 1
) order by dt
谢谢, 马赫什