我正在搜索查询以选择最长日期(datetime
列)并保留其id
和row_id
。希望是DELETE
源表中的行。
来源数据
id date row_id(unique)
1 11/11/2009 1
1 12/11/2009 2
1 13/11/2009 3
2 1/11/2009 4
预期幸存者
1 13/11/2009 3
2 1/11/2009 4
我需要什么查询来实现我正在寻找的结果?
答案 0 :(得分:2)
在PostgreSQL上测试:
delete from table where (id, date) not in (select id, max(date) from table group by id);
答案 1 :(得分:1)
有多种方法可以做到这一点,但基本思路是一样的:
- 识别要保留的行
- 将表格中的每一行与要保留的行进行比较
- 删除任何不匹配的
DELETE
[source]
FROM
yourTable AS [source]
LEFT JOIN
yourTable AS [keep]
ON [keep].id = [source].id
AND [keep].date = (SELECT MAX(date) FROM yourTable WHERE id = [keep].id)
WHERE
[keep].id IS NULL
DELETE
[yourTable]
FROM
[yourTable]
LEFT JOIN
(
SELECT id, MAX(date) AS date FROM yourTable GROUP BY id
)
AS [keep]
ON [keep].id = [yourTable].id
AND [keep].date = [yourTable].date
WHERE
[keep].id IS NULL
DELETE
[source]
FROM
yourTable AS [source]
WHERE
[source].row_id != (SELECT TOP 1 row_id FROM yourTable WHERE id = [source].id ORDER BY date DESC)
DELETE
[source]
FROM
yourTable AS [source]
WHERE
NOT EXISTS (SELECT id FROM yourTable GROUP BY id HAVING id = [source].id AND MAX(date) != [source].date)
答案 2 :(得分:0)
因为您使用的是SQL Server 2000,所以您无法使用Row Over技术设置序列并识别每个唯一ID的顶行。
因此,您提出的技术是使用日期时间列来获取前1行以删除重复项。这可能有效,但有可能您仍然可以获得具有相同日期时间值的重复项。但这很容易检查。
首先根据id和date列检查所有行都是唯一的假设:
CREATE TABLE #TestTable (rowid INT IDENTITY(1,1), thisid INT, thisdate DATETIME)
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '11/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/12/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
SELECT COUNT(*) AS thiscount
FROM #TestTable
GROUP BY thisid, thisdate
HAVING COUNT(*) > 1
此示例返回值2 - 表示即使在使用日期列删除重复项后仍然会出现重复项。如果您返回0,那么您已经证明您提出的技术可行。
在重复生产数据时,我认为应该采取一些预防措施并在之前和之后进行测试。您应该创建一个表来保存您计划删除的行,以便在执行delete语句后需要时可以轻松恢复它们。
此外,最好事先了解您计划删除多少行,这样您就可以验证前后的计数 - 并且您可以衡量删除操作的大小。根据受影响的行数,您可以计划何时运行该操作。
要在重复数据删除过程之前进行测试,请找到事件。
-- Get occurrences of duplicates
SELECT COUNT(*) AS thiscount
FROM
#TestTable
GROUP BY thisid
HAVING COUNT(*) > 1
ORDER BY thisid
它为您提供具有多个具有相同ID的行的行。将此查询中的行捕获到临时表中,然后使用SUM运行查询,以根据您的密钥获取不唯一的行总数。
要获取您计划删除的行数,您需要根据唯一键重复的行数,以及基于唯一键的不同行数。您从出现次数中减去不同的行。所有这一切都非常简单 - 所以我会告诉你。
答案 3 :(得分:0)
试试这个
declare @t table (id int, dt DATETIME,rowid INT IDENTITY(1,1))
INSERT INTO @t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO @t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO @t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO @t (id,dt) VALUES (2, '11/01/2009')
查询:
delete from @t where rowid not in(
select t.rowid from @t t
inner join(
select MAX(dt)maxdate
from @t
group by id) X
on t.dt = X.maxdate )
select * from @t
输出:
id dt rowid
1 2009-11-13 00:00:00.000 3
2 2009-11-01 00:00:00.000 4
答案 4 :(得分:0)
delete from temp where row_id not in (
select t.row_id from temp t
right join
(select id,MAX(dt) as dt from temp group by id) d
on t.dt = d.dt and t.id = d.id)
我已经测试了这个答案..
答案 5 :(得分:0)
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
select * from #t
;WITH T AS(
select dense_rank() over(partition by id order by dt desc)NO,DT,ID,rowid from #t )
DELETE T WHERE NO>1