TSQL:根据max(date)删除重复项

时间:2009-11-23 12:41:24

标签: tsql sql-server-2000

我正在搜索查询以选择最长日期(datetime列)并保留其idrow_id。希望是DELETE源表中的行。

来源数据

id     date         row_id(unique)
1      11/11/2009    1
1      12/11/2009    2
1      13/11/2009    3
2      1/11/2009     4

预期幸存者

1      13/11/2009    3
2      1/11/2009     4

我需要什么查询来实现我正在寻找的结果?

6 个答案:

答案 0 :(得分:2)

在PostgreSQL上测试:

delete from table where (id, date) not in (select id, max(date) from table group by id);

答案 1 :(得分:1)

有多种方法可以做到这一点,但基本思路是一样的:
- 识别要保留的行 - 将表格中的每一行与要保留的行进行比较 - 删除任何不匹配的

DELETE
   [source]
FROM
   yourTable    AS [source]
LEFT JOIN
   yourTable    AS [keep]
      ON  [keep].id = [source].id
      AND [keep].date = (SELECT MAX(date) FROM yourTable WHERE id = [keep].id)
WHERE
   [keep].id IS NULL


DELETE
   [yourTable]
FROM
   [yourTable]
LEFT JOIN
(
   SELECT id, MAX(date) AS date FROM yourTable GROUP BY id
)
   AS [keep]
      ON  [keep].id   = [yourTable].id
      AND [keep].date = [yourTable].date
WHERE
   [keep].id IS NULL


DELETE
   [source]
FROM
   yourTable    AS [source]
WHERE
   [source].row_id != (SELECT TOP 1 row_id FROM yourTable WHERE id = [source].id ORDER BY date DESC)


DELETE
   [source]
FROM
   yourTable    AS [source]
WHERE
   NOT EXISTS (SELECT id FROM yourTable GROUP BY id HAVING id = [source].id AND MAX(date) != [source].date)

答案 2 :(得分:0)

因为您使用的是SQL Server 2000,所以您无法使用Row Over技术设置序列并识别每个唯一ID的顶行。

因此,您提出的技术是使用日期时间列来获取前1行以删除重复项。这可能有效,但有可能您仍然可以获得具有相同日期时间值的重复项。但这很容易检查。

首先根据id和date列检查所有行都是唯一的假设:

CREATE TABLE #TestTable (rowid INT IDENTITY(1,1), thisid INT, thisdate DATETIME)
INSERT INTO #TestTable (thisid,thisdate) VALUES  (1, '11/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES  (1, '12/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES  (1, '12/12/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES  (2, '1/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES  (2, '1/11/2009')

SELECT COUNT(*) AS thiscount
FROM #TestTable
GROUP BY thisid, thisdate
HAVING COUNT(*) > 1

此示例返回值2 - 表示即使在使用日期列删除重复项后仍然会出现重复项。如果您返回0,那么您已经证明您提出的技术可行。

在重复生产数据时,我认为应该采取一些预防措施并在之前和之后进行测试。您应该创建一个表来保存您计划删除的行,以便在执行delete语句后需要时可以轻松恢复它们。

此外,最好事先了解您计划删除多少行,这样您就可以验证前后的计数 - 并且您可以衡量删除操作的大小。根据受影响的行数,您可以计划何时运行该操作。

要在重复数据删除过程之前进行测试,请找到事件。

-- Get occurrences of duplicates
SELECT COUNT(*) AS thiscount
FROM 
#TestTable
GROUP BY thisid
HAVING COUNT(*) > 1
ORDER BY thisid

它为您提供具有多个具有相同ID的行的行。将此查询中的行捕获到临时表中,然后使用SUM运行查询,以根据您的密钥获取不唯一的行总数。

要获取您计划删除的行数,您需要根据唯一键重复的行数,以及基于唯一键的不同行数。您从出现次数中减去不同的行。所有这一切都非常简单 - 所以我会告诉你。

答案 3 :(得分:0)

试试这个

declare @t table (id int, dt DATETIME,rowid INT IDENTITY(1,1))
INSERT INTO @t (id,dt) VALUES  (1, '11/11/2009')
INSERT INTO @t (id,dt) VALUES  (1, '11/12/2009')
INSERT INTO @t (id,dt) VALUES  (1, '11/13/2009')
INSERT INTO @t (id,dt) VALUES  (2, '11/01/2009')

查询:

delete from @t where rowid not in(
select t.rowid from @t t
inner join(
select MAX(dt)maxdate
from @t
group by id) X
on t.dt = X.maxdate )

select * from @t

输出:

id dt rowid
1 2009-11-13 00:00:00.000 3
2 2009-11-01 00:00:00.000 4

答案 4 :(得分:0)

delete from temp where row_id not in (
        select t.row_id from temp t
        right join 
        (select id,MAX(dt) as dt from temp group by id) d
        on t.dt = d.dt and t.id = d.id) 

我已经测试了这个答案..

答案 5 :(得分:0)

INSERT INTO #t (id,dt) VALUES  (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES  (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES  (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES  (2, '11/01/2009')
select * from #t

;WITH T AS(
select dense_rank() over(partition by id order by dt desc)NO,DT,ID,rowid  from #t )

DELETE T  WHERE NO>1