我有一个我需要每天修剪的数据集。它是从一个定期将记录写入表中的进程填充的。
我目前有一个简单的查询:
DELETE FROM dataTable WHERE entryDate < dateadd(day, -5, GETDATE())
但问题是这个过程不可靠;可能有几天没有写任何数据。
所以我真正需要的是一个可以追溯5(可能是非连续的)天写入数据的查询,不是5个日历日。
例如,如果我运行以下查询:
SELECT cast(entryDate as date) as LogDate
FROM dataTable
group by category, cast(entryDate as date)
order by cast(entryDate as date) desc
我可能会得到结果:
Category Date
Foo 2015-11-30
Foo 2015-11-29
Foo 2015-11-26
Foo 2015-11-25
Foo 2015-11-21
Foo 2015-11-20 <-- Start Pruning here, not the 25th.
Foo 2015-11-19
Foo 2015-11-18
Bar 2015-11-30
Bar 2015-11-29
Bar 2015-11-28
Bar 2015-11-27
Bar 2015-11-26
Bar 2015-11-25 <-- This one is OK to prune at the 25th.
Bar 2015-11-24
Bar 2015-11-23
我需要在删除之前将查询一直追溯到20日。
答案 0 :(得分:3)
您可以使用row_number
获取表格中有条目的最后5天。然后根据生成的数字删除。
with rownums as (SELECT row_number() over(partition by category order by cast(entryDate as date) desc) as rn
,*
FROM dataTable
)
delete from rownums where rn <= 5 --use > 5 for records prior to the last 5 days
如果每天可以有多个条目,请使用dense_rank
对行进行编号。
with rownums as (SELECT dense_rank() over(partition by category order by cast(entryDate as date) desc) as rn
,*
FROM dataTable)
delete from rownums where rn > 5;
答案 1 :(得分:1)
尝试类似这样的事情。
;WITH orderedDates (LogDate, RowNum)
AS
(
SELECT [CACHEDATE] AS LogDate, ROW_NUMBER() OVER (ORDER BY CACHEDATE DESC) AS RowNum
FROM
dataTable
GROUP BY CACHEDATE
)
DELETE dataTable
WHERE CACHEDATE IN
(SELECT LogDate FROM orderedDates
WHERE ROWNUM > 5) --or however many you need to go back