我正在使用SQL Server 2012,我有一个包含大约35列和1000多万行的表。现在我想在日期时间戳和其他各种过滤器的基础上进行数据清除。
样本数据如下
ID DateTimeStamp value1 value2 value3 .... Value 35
-----------------------------------------------------------------------
2 2016-07-26 15:12:41 0.00 126.20 328051.07
2 2016-07-26 15:18:17 0.00 126.14 328052.32
2 2016-07-26 15:23:17 0.00 126.75 328054.40
2 2016-07-26 15:28:15 0.00 126.95 328060.64
2 2016-07-26 15:34:15 0.00 126.95 328060.64
我想根据时间间隔进行数据清除,假设我选择5分钟的时间间隔,我的预期结果集应该如下所示
While(1)
begin
StartDate = Start date of data purging at first iteration latter on assign it to EndDate in all next iteration
EndDate = EndDate + Interval
NextEndDate = EndDate + Interval
Set maxDateTime = Select top(1) *
from <TableName>
where dateTime between StartDate to End Date
order by datetime asc
Set minDateTime = Select top(1) *
from <TableName>
where dateTime between EndDate to End NextEndDate
order by datetime desc
Now compare difference and choose the one which is smaller.
Diff(maxDateTime, EndDateTime) & Diff (minDateTime, EndDateTime)
end
应该以这样的方式发生:如果所需的日期时间戳不存在,则应考虑最接近的值(上一个或下一个,最接近哪一个。)
虽然我已经用低于逻辑(伪代码)实现了它,但它非常慢
{{1}}
任何人都可以建议上述逻辑的有效方法
答案 0 :(得分:3)
以下示例删除每个5分钟间隔内除第一行以外的所有行。此方法为每个时间间隔使用一个循环以提高并发性并避免填充事务日志,尽管可以使用计数表(或CTE)作为单个基于集合的操作来计算时间间隔(如果这些不是您关注的话)。 / p>
将DateTimeStamp
作为最左侧的键列,以获得一个索引(理想地聚类)以提高性能非常重要。
CREATE TABLE dbo.TableName(
ID int NOT NULL
, DateTimeStamp datetime2(0) NOT NULL
, value1 decimal(18,2) NOT NULL
, value2 decimal(18,2) NOT NULL
, value3 decimal(18,2) NOT NULL
)
GO
INSERT INTO dbo.TableName VALUES
(2, '2016-07-26 15:12:41', 0.00, 126.20, 328051.07)
,(2, '2016-07-26 15:14:41', 0.00, 126.20, 328051.07)
,(2, '2016-07-26 15:18:17', 0.00, 126.14, 328052.32)
,(2, '2016-07-26 15:23:17', 0.00, 126.75, 328054.40)
,(2, '2016-07-26 15:24:34', 0.00, 126.75, 328054.40)
,(2, '2016-07-26 15:25:18', 0.00, 126.75, 328054.40)
,(2, '2016-07-26 15:28:15', 0.00, 126.95, 328060.64)
,(2, '2016-07-26 15:29:15', 0.00, 126.95, 328060.64)
,(2, '2016-07-26 15:30:15', 0.00, 126.95, 328060.64)
,(2, '2016-07-26 15:34:15', 0.00, 126.95, 328060.64);
GO
CREATE CLUSTERED INDEX cdx ON dbo.TableName(DateTimeStamp);
GO
SET NOCOUNT ON;
DECLARE
@StartDateTimeStamp datetime2(0)
, @LastDateTimeStamp datetime2(0)
, @EndDateTimeStamp datetime2(0)
, @IntervalSeconds int = 300;
SET @StartDateTimeStamp = (SELECT MIN(DateTimeStamp) FROM dbo.TableName);
SET @LastDateTimeStamp = (SELECT MAX(DateTimeStamp) FROM dbo.TableName);
WHILE @StartDateTimeStamp <= @LastDateTimeStamp
BEGIN
SET @EndDateTimeStamp = DATEADD(second, @IntervalSeconds, @StartDateTimeStamp);
WITH rows_to_delete AS (
SELECT ROW_NUMBER() OVER(ORDER BY DateTimeStamp) AS row_num
FROM dbo.TableName
WHERE
DateTimeStamp >= @StartDateTimeStamp
AND DateTimeStamp < @EndDateTimeStamp
)
DELETE rows_to_delete
WHERE row_num > 1;
SET @StartDateTimeStamp = DATEADD(second, @IntervalSeconds, @StartDateTimeStamp);
END;
GO