根据其他条目(差距和岛屿)按时间间隔对记录进行分组

时间:2018-03-15 15:54:57

标签: sql sql-server tsql timestamp grouping

使用SQL Server 2012,我有以下活动表,其数据按IP, Timestamp, User排序:

样本表

---------------------------------------------
| Timestamp        | IP            | User   |
|------------------|---------------|--------|
| 2018-03-13 08:30 | 192.168.0.10  | user3  |
| 2018-03-14 01:30 | 192.168.0.10  | user1  |
| 2018-03-14 07:00 | 192.168.0.10  | user1  |
| 2018-03-14 10:10 | 192.168.0.10  | user1  |
| 2018-03-14 11:00 | 192.168.0.10  | user10 |
| 2018-03-14 13:50 | 192.168.0.10  | user10 |
| 2018-03-14 18:00 | 192.168.0.10  | user1  |
| 2018-03-14 01:30 | 192.168.0.150 | user1  |
| 2018-03-15 08:00 | 192.168.0.170 | user1  |
| 2018-03-15 12:20 | 192.168.0.170 | user1  |
| 2018-03-14 10:00 | 192.168.0.20  | user2  |
| 2018-03-14 15:30 | 192.168.0.20  | user2  |
| 2018-03-14 17:30 | 192.168.0.20  | user2  |
---------------------------------------------

我想知道用户从录制的IP连接的时间间隔,所需的输出如下:

期望的输出

----------------------------------------------------------------
| From             | To               | IP            | User   |
|------------------|------------------|---------------|--------|
| 2018-03-13 08:30 | 2018-03-13 08:30 | 192.168.0.10  | user3  |
| 2018-03-14 01:30 | 2018-03-14 10:10 | 192.168.0.10  | user1  |
| 2018-03-14 11:00 | 2018-03-14 13:50 | 192.168.0.10  | user10 |
| 2018-03-14 18:00 | 2018-03-14 18:00 | 192.168.0.10  | user1  |
| 2018-03-14 01:30 | 2018-03-14 01:30 | 192.168.0.150 | user1  |
| 2018-03-15 08:00 | 2018-03-15 12:20 | 192.168.0.170 | user1  |
| 2018-03-14 10:00 | 2018-03-14 17:30 | 192.168.0.20  | user2  |
----------------------------------------------------------------

值得注意的是,在此示例中,user1已记录两个IP 192.168.0.10的时间间隔,从2018-03-14 01:302018-03-14 10:10以及从2018-03-14 18:00到{{1}因此,分组不应仅采用2018-03-14 18:00对的最小和最大时间戳。

到目前为止,构建的查询只有上面提到的一个缺陷 - 将两个条目分组为一个,从IP, User2018-03-14 01:30

当前查询

2018-03-14 18:00

并尝试了可能有用的窗口,但目前输出是相同的:

SELECT 
    MIN([Timestamp]) AS [From],
    MAX([Timestamp]) AS [To],
    Ip,
    User
FROM #mtt
GROUP BY IP, User
ORDER BY IP, [From], [To] DESC, User;

实际输出

SELECT DISTINCT
    MIN([Timestamp]) OVER (PARTITION BY ClientIp, UsernameHash ORDER BY ClientIp, [Timestamp]) AS [From],
    MAX([Timestamp]) OVER (PARTITION BY ClientIp, UsernameHash ORDER BY ClientIp, [Timestamp] DESC) AS [To],
    [ClientIp],
    [UsernameHash]
FROM #mtt
GROUP BY ClientIp, UsernameHash, [Timestamp]
ORDER BY ClientIp, [From], [To] DESC, UsernameHash;

包括临时表的创建:

表创建查询

----------------------------------------------------------------
| From             | To               | IP            | User   |
|------------------|------------------|---------------|--------|
| 2018-03-13 08:30 | 2018-03-13 08:30 | 192.168.0.10  | user3  |
| 2018-03-14 01:30 | 2018-03-14 18:00 | 192.168.0.10  | user1  |
| 2018-03-14 11:00 | 2018-03-14 13:50 | 192.168.0.10  | user10 |
| 2018-03-14 01:30 | 2018-03-14 01:30 | 192.168.0.150 | user1  |
| 2018-03-15 08:00 | 2018-03-15 12:20 | 192.168.0.170 | user1  |
| 2018-03-14 10:00 | 2018-03-14 17:30 | 192.168.0.20  | user2  |
----------------------------------------------------------------

最后要提及的是,IF OBJECT_ID('tempdb..#mtt') IS NULL BEGIN CREATE TABLE #mtt ( [Timestamp] datetime, ClientIp varchar(45), UsernameHash varchar(255) ); END DELETE FROM #mtt; INSERT INTO #mtt([Timestamp], ClientIp, UsernameHash) SELECT '2018-03-14 01:30', '192.168.0.10', 'user1' UNION ALL SELECT '2018-03-14 07:00', '192.168.0.10', 'user1' UNION ALL SELECT '2018-03-14 10:10', '192.168.0.10', 'user1' UNION ALL SELECT '2018-03-14 11:00', '192.168.0.10', 'user10' UNION ALL SELECT '2018-03-14 10:00', '192.168.0.20', 'user2' UNION ALL SELECT '2018-03-14 01:30', '192.168.0.150', 'user1' UNION ALL SELECT '2018-03-13 08:30', '192.168.0.10', 'user3' UNION ALL SELECT '2018-03-14 13:50', '192.168.0.10', 'user10' UNION ALL SELECT '2018-03-14 15:30', '192.168.0.20', 'user2' UNION ALL SELECT '2018-03-14 17:30', '192.168.0.20', 'user2' UNION ALL SELECT '2018-03-14 18:00', '192.168.0.10', 'user1' UNION ALL SELECT '2018-03-15 08:00', '192.168.0.170', 'user1' UNION ALL SELECT '2018-03-15 12:20', '192.168.0.170', 'user1'; From时间戳相同的输出记录(例如To不是强制性的,但稍微偏好。

非常感谢任何实现此分组的想法,谢谢!

1 个答案:

答案 0 :(得分:1)

这是一个" group-and-islands"问题。一个简单的解决方案使用row_number()和聚合:

select user, ip, min(timestamp), max(timestamp)
from (select mtt.*,
             row_number() over (partition by ip order by timestamp) as seqnum_t,
             row_number() over (partition by ip, user order by timestamp) as seqnum_ut
      from #mtt mtt
     ) mtt
group by ip, user, (seqnum_t - seqnum_ut);

为什么这个工作有点难以解释。但是,如果您运行子查询并盯着结果,您将看到两个序列号之间的差异标识了相邻记录的组。