删除最旧的重复项并按时间戳保留最新的重复项

时间:2018-09-26 17:00:49

标签: sql sql-server common-table-expression

我有一个查询,如下所示:

;WITH Duplicates AS 
    (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY ChannelName, SerialNumber, ReadingDate ORDER BY ChannelName) AS Rownumber
        FROM [Staging].[UriData]        
    )       
    DELETE FROM Duplicates WHERE Rownumber > 1
    --AND ROWNUMBER >=< ???
    OPTION (MAXRECURSION 0)

这很好用,可以在表中找到重复项。但是,该表经常使用更正后的数据进行更新。

到查询运行时,可能已经有三个或更多更新。

这意味着我要删除除最新记录以外的所有记录。表中有一个timestamp字段,它表示最近一次插入的时间。我假设我应该使用此字段来确定哪个是最新行,而不是不是最高行号的行,则将其删除。这是正确的方法吗?

TIA

2 个答案:

答案 0 :(得分:3)

您当然可以将timestamp列与ROW_NUMBER()一起使用,并且您无需使用递归提示,因为您的CTE没有任何递归级别。

;WITH Duplicates AS  (
       SELECT *, 
              ROW_NUMBER() OVER (PARTITION BY ChannelName, SerialNumber, ReadingDate ORDER BY timestamp DESC) AS Rownumber
       FROM [Staging].[UriData]        
 ) 

DELETE d
FROM Duplicates d
WHERE Rownumber > 1;

答案 1 :(得分:1)

DELETE older
FROM Staging.UriData older
WHERE EXISTS(SELECT 1
   FROM Staging.UriData newer
   WHERE newer.ChannelName = older.older
      and newer.SerialNumber = older.SerialNumber
      and newer.ReadingDate = older.ReadingDate
      and newer.timestamp > older.timestamp
)