我有一个存储时间戳事件的表。我想通过在时间戳列上使用5分钟的滑动窗口将事件分组为“序列”,并将“序列ID”(可以区分序列的任何ID)和“按顺序排序”写入另一个表。
输入 - 事件表:
+----+-------+-----------+
| Id | Name | Timestamp |
+----+-------+-----------+
| 1 | test | 00:00:00 |
| 2 | test | 00:06:00 |
| 3 | test | 00:10:00 |
| 4 | test | 00:14:00 |
+----+-------+-----------+
所需的输出 - 序列表。这里SeqId是起始事件的ID,但它不一定是唯一标识序列的东西。
+---------+-------+----------+
| EventId | SeqId | SeqOrder |
+---------+-------+----------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 2 |
| 4 | 2 | 3 |
+---------+-------+----------+
最好的方法是什么?这是MSSQL 2008,如果它们让事情变得更容易,我可以使用SSAS和SSIS。
答案 0 :(得分:1)
当你的距离不到五分钟时,你似乎希望把事物组合在一起。您可以通过获取上一个时间戳并标记组的开头来分配组。然后,您需要执行累积总和以获取组ID:
with e as (
select e.*,
(case when datediff(minute, prev_timestamp, timestamp) < 5 then 1 else 0 end) as flag
from (select e.*,
(select top 1 e2.timestamp
from events e2
where e2.timestamp < e.timestamp
order by e2.timestamp desc
) as prev_timestamp
from events e
) e
)
select e.eventId, e.seqId,
row_number() over (partition by seqId order b timestamp) as seqOrder
from (select e.*, (select sum(flag) from e e2 where e2.timestamp <= e.timestamp) as seqId
from e
) e;
顺便说一句,这个逻辑在SQL Server 2012+中更容易表达,因为窗口函数更强大。
答案 1 :(得分:1)
CREATE TABLE #Input (Id INT, Name VARCHAR(20), Time_stamp TIME)
INSERT INTO #Input
VALUES
( 1 ,'test','00:00:00' ),
( 2 ,'test','00:06:00' ),
( 3 ,'test','00:10:00' ),
( 4 ,'test','00:14:00' )
SELECT * FROM #Input;
WITH cte AS -- add a sequential number
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY Id) AS sort
FROM #Input
), cte2 as -- find the Id's with a difference of more than 5min
(
SELECT cte.*,
CASE WHEN DATEDIFF(MI, cte_1.Time_stamp,cte.Time_stamp) < 5 THEN 0 ELSE 1 END as GrpType
FROM cte
LEFT OUTER JOIN
cte as cte_1 on cte.sort =cte_1.sort +1
), cte3 as -- assign a SeqId
(
SELECT GrpType, Time_Stamp,ROW_NUMBER() OVER(ORDER BY Time_stamp) SeqId
FROM cte2
WHERE GrpType = 1
), cte4 as -- find the Time_Stamp range per SeqId
(
SELECT cte3.*,cte_2.Time_stamp as TS_to
FROM cte3
LEFT OUTER JOIN
cte3 as cte_2 on cte3.SeqId =cte_2.SeqId -1
)
-- final query
SELECT
t.Id,
cte4.SeqId,
ROW_NUMBER() OVER(PARTITION BY cte4.SeqId ORDER BY t.Time_stamp) AS SeqOrder
FROM cte4 INNER JOIN #Input t ON t.Time_stamp>=cte4.Time_stamp AND (t.Time_stamp <cte4.TS_to OR cte4.TS_to IS NULL);
这段代码稍微复杂一点,但它返回了预期的输出(Gordon Linoffs解决方案没有...),它甚至更快。