使用DateTime列上的滑动窗口将行分组为序列

时间:2014-07-27 20:04:46

标签: sql sql-server ssis

我有一个存储时间戳事件的表。我想通过在时间戳列上使用5分钟的滑动窗口将事件分组为“序列”,并将“序列ID”(可以区分序列的任何ID)和“按顺序排序”写入另一个表。

输入 - 事件表:

+----+-------+-----------+
| Id | Name  | Timestamp |
+----+-------+-----------+
|  1 | test  | 00:00:00  |
|  2 | test  | 00:06:00  |
|  3 | test  | 00:10:00  |
|  4 | test  | 00:14:00  |
+----+-------+-----------+

所需的输出 - 序列表。这里SeqId是起始事件的ID,但它不一定是唯一标识序列的东西。

+---------+-------+----------+
| EventId | SeqId | SeqOrder |
+---------+-------+----------+
|       1 |     1 |        1 |
|       2 |     2 |        1 |
|       3 |     2 |        2 |
|       4 |     2 |        3 |
+---------+-------+----------+

最好的方法是什么?这是MSSQL 2008,如果它们让事情变得更容易,我可以使用SSAS和SSIS。

2 个答案:

答案 0 :(得分:1)

当你的距离不到五分钟时,你似乎希望把事物组合在一起。您可以通过获取上一个时间戳并标记组的开头来分配组。然后,您需要执行累积总和以获取组ID:

with e as (
      select e.*,
             (case when datediff(minute, prev_timestamp, timestamp) < 5 then 1 else 0 end) as flag
      from (select e.*,
                   (select top 1 e2.timestamp
                    from events e2
                    where e2.timestamp < e.timestamp
                     order by e2.timestamp desc
                   ) as prev_timestamp
            from events e
           ) e
     )
select e.eventId, e.seqId,
       row_number() over (partition by seqId order b timestamp) as seqOrder
from (select e.*, (select sum(flag) from e e2 where e2.timestamp <= e.timestamp) as seqId
      from e
     ) e;

顺便说一句,这个逻辑在SQL Server 2012+中更容易表达,因为窗口函数更强大。

答案 1 :(得分:1)

CREATE TABLE #Input (Id INT, Name VARCHAR(20), Time_stamp TIME)
INSERT INTO #Input
VALUES
(  1 ,'test','00:00:00'  ),
(  2 ,'test','00:06:00'  ),
(  3 ,'test','00:10:00'  ),
(  4 ,'test','00:14:00'  )

SELECT * FROM #Input;

WITH cte AS -- add a sequential number
(
    SELECT *, 
    ROW_NUMBER() OVER(ORDER BY Id) AS sort
    FROM #Input
), cte2 as -- find the Id's with a difference of more than 5min 
(
    SELECT cte.*,
    CASE WHEN DATEDIFF(MI, cte_1.Time_stamp,cte.Time_stamp) < 5 THEN 0 ELSE 1 END as GrpType
    FROM cte
    LEFT OUTER JOIN 
    cte as cte_1 on cte.sort =cte_1.sort +1
), cte3 as -- assign a SeqId
(
    SELECT GrpType, Time_Stamp,ROW_NUMBER() OVER(ORDER BY Time_stamp) SeqId
    FROM cte2 
    WHERE GrpType = 1

), cte4 as -- find the Time_Stamp range per SeqId
(
    SELECT cte3.*,cte_2.Time_stamp as TS_to
    FROM cte3
    LEFT OUTER JOIN 
    cte3 as cte_2 on cte3.SeqId =cte_2.SeqId -1
)
-- final query
SELECT 
    t.Id, 
    cte4.SeqId, 
    ROW_NUMBER() OVER(PARTITION BY cte4.SeqId ORDER BY t.Time_stamp) AS SeqOrder
FROM cte4 INNER JOIN #Input t ON t.Time_stamp>=cte4.Time_stamp AND (t.Time_stamp <cte4.TS_to OR  cte4.TS_to IS NULL);

这段代码稍微复杂一点,但它返回了预期的输出(Gordon Linoffs解决方案没有...),它甚至更快。