基于具有10万多行的表中的datetime列的Sql-Data清除

时间:2018-03-31 06:25:04

标签: sql-server-2012

我正在使用SQL Server 2012,我有一个包含大约35列和1000多万行的表。现在我想在日期时间戳和其他各种过滤器的基础上进行数据清除。

样本数据如下

ID  DateTimeStamp               value1  value2   value3 .... Value 35
-----------------------------------------------------------------------
2   2016-07-26 15:12:41 0.00    126.20  328051.07
2   2016-07-26 15:18:17 0.00    126.14  328052.32
2   2016-07-26 15:23:17 0.00    126.75  328054.40
2   2016-07-26 15:28:15 0.00    126.95  328060.64
2   2016-07-26 15:34:15 0.00    126.95  328060.64

我想根据时间间隔进行数据清除,假设我选择5分钟的时间间隔,我的预期结果集应该如下所示

While(1)
begin
    StartDate = Start date of data purging at first iteration latter on assign it to EndDate in all next iteration 

    EndDate = EndDate + Interval
    NextEndDate  = EndDate + Interval

    Set maxDateTime = Select top(1) * 
                      from <TableName> 
                      where dateTime between StartDate to End Date  
                      order by datetime asc

    Set minDateTime = Select top(1) * 
                      from <TableName> 
                      where dateTime between EndDate to End NextEndDate 
                      order by datetime desc

    Now compare difference and choose the one which is smaller.  
        Diff(maxDateTime, EndDateTime) & Diff (minDateTime, EndDateTime)
end

应该以这样的方式发生:如果所需的日期时间戳不存在,则应考虑最接近的值(上一个或下一个,最接近哪一个。)

虽然我已经用低于逻辑(伪代码)实现了它,但它非常慢

{{1}}

任何人都可以建议上述逻辑的有效方法

1 个答案:

答案 0 :(得分:3)

以下示例删除每个5分钟间隔内除第一行以外的所有行。此方法为每个时间间隔使用一个循环以提高并发性并避免填充事务日志,尽管可以使用计数表(或CTE)作为单个基于集合的操作来计算时间间隔(如果这些不是您关注的话)。 / p>

DateTimeStamp作为最左侧的键列,以获得一个索引(理想地聚类)以提高性能非常重要。

CREATE TABLE dbo.TableName(
      ID  int NOT NULL
    , DateTimeStamp datetime2(0) NOT NULL
    , value1 decimal(18,2) NOT NULL
    , value2 decimal(18,2) NOT NULL
    , value3 decimal(18,2) NOT NULL
)
GO

INSERT INTO dbo.TableName VALUES
     (2, '2016-07-26 15:12:41', 0.00, 126.20, 328051.07)
    ,(2, '2016-07-26 15:14:41', 0.00, 126.20, 328051.07)
    ,(2, '2016-07-26 15:18:17', 0.00, 126.14, 328052.32)
    ,(2, '2016-07-26 15:23:17', 0.00, 126.75, 328054.40)
    ,(2, '2016-07-26 15:24:34', 0.00, 126.75, 328054.40)
    ,(2, '2016-07-26 15:25:18', 0.00, 126.75, 328054.40)
    ,(2, '2016-07-26 15:28:15', 0.00, 126.95, 328060.64)
    ,(2, '2016-07-26 15:29:15', 0.00, 126.95, 328060.64)
    ,(2, '2016-07-26 15:30:15', 0.00, 126.95, 328060.64)
    ,(2, '2016-07-26 15:34:15', 0.00, 126.95, 328060.64);
GO
CREATE CLUSTERED INDEX cdx ON dbo.TableName(DateTimeStamp);
GO

SET NOCOUNT ON;
DECLARE
      @StartDateTimeStamp datetime2(0)
    , @LastDateTimeStamp datetime2(0)
    , @EndDateTimeStamp datetime2(0)
    , @IntervalSeconds int = 300;
SET @StartDateTimeStamp = (SELECT MIN(DateTimeStamp) FROM dbo.TableName);
SET @LastDateTimeStamp = (SELECT MAX(DateTimeStamp) FROM dbo.TableName);

WHILE @StartDateTimeStamp <= @LastDateTimeStamp
BEGIN
    SET @EndDateTimeStamp = DATEADD(second, @IntervalSeconds, @StartDateTimeStamp);

    WITH rows_to_delete AS (
        SELECT ROW_NUMBER() OVER(ORDER BY DateTimeStamp) AS row_num
        FROM dbo.TableName
        WHERE 
            DateTimeStamp >= @StartDateTimeStamp
            AND DateTimeStamp < @EndDateTimeStamp
        )
    DELETE rows_to_delete
    WHERE row_num > 1;

    SET @StartDateTimeStamp = DATEADD(second, @IntervalSeconds, @StartDateTimeStamp);
END;
GO