使用SQL Server 2012,我有以下活动表,其数据按IP, Timestamp, User
排序:
---------------------------------------------
| Timestamp | IP | User |
|------------------|---------------|--------|
| 2018-03-13 08:30 | 192.168.0.10 | user3 |
| 2018-03-14 01:30 | 192.168.0.10 | user1 |
| 2018-03-14 07:00 | 192.168.0.10 | user1 |
| 2018-03-14 10:10 | 192.168.0.10 | user1 |
| 2018-03-14 11:00 | 192.168.0.10 | user10 |
| 2018-03-14 13:50 | 192.168.0.10 | user10 |
| 2018-03-14 18:00 | 192.168.0.10 | user1 |
| 2018-03-14 01:30 | 192.168.0.150 | user1 |
| 2018-03-15 08:00 | 192.168.0.170 | user1 |
| 2018-03-15 12:20 | 192.168.0.170 | user1 |
| 2018-03-14 10:00 | 192.168.0.20 | user2 |
| 2018-03-14 15:30 | 192.168.0.20 | user2 |
| 2018-03-14 17:30 | 192.168.0.20 | user2 |
---------------------------------------------
我想知道用户从录制的IP连接的时间间隔,所需的输出如下:
----------------------------------------------------------------
| From | To | IP | User |
|------------------|------------------|---------------|--------|
| 2018-03-13 08:30 | 2018-03-13 08:30 | 192.168.0.10 | user3 |
| 2018-03-14 01:30 | 2018-03-14 10:10 | 192.168.0.10 | user1 |
| 2018-03-14 11:00 | 2018-03-14 13:50 | 192.168.0.10 | user10 |
| 2018-03-14 18:00 | 2018-03-14 18:00 | 192.168.0.10 | user1 |
| 2018-03-14 01:30 | 2018-03-14 01:30 | 192.168.0.150 | user1 |
| 2018-03-15 08:00 | 2018-03-15 12:20 | 192.168.0.170 | user1 |
| 2018-03-14 10:00 | 2018-03-14 17:30 | 192.168.0.20 | user2 |
----------------------------------------------------------------
值得注意的是,在此示例中,user1
已记录两个IP 192.168.0.10
的时间间隔,从2018-03-14 01:30
到2018-03-14 10:10
以及从2018-03-14 18:00
到{{1}因此,分组不应仅采用2018-03-14 18:00
对的最小和最大时间戳。
到目前为止,构建的查询只有上面提到的一个缺陷 - 将两个条目分组为一个,从IP, User
到2018-03-14 01:30
。
2018-03-14 18:00
并尝试了可能有用的窗口,但目前输出是相同的:
SELECT
MIN([Timestamp]) AS [From],
MAX([Timestamp]) AS [To],
Ip,
User
FROM #mtt
GROUP BY IP, User
ORDER BY IP, [From], [To] DESC, User;
SELECT DISTINCT
MIN([Timestamp]) OVER (PARTITION BY ClientIp, UsernameHash ORDER BY ClientIp, [Timestamp]) AS [From],
MAX([Timestamp]) OVER (PARTITION BY ClientIp, UsernameHash ORDER BY ClientIp, [Timestamp] DESC) AS [To],
[ClientIp],
[UsernameHash]
FROM #mtt
GROUP BY ClientIp, UsernameHash, [Timestamp]
ORDER BY ClientIp, [From], [To] DESC, UsernameHash;
包括临时表的创建:
----------------------------------------------------------------
| From | To | IP | User |
|------------------|------------------|---------------|--------|
| 2018-03-13 08:30 | 2018-03-13 08:30 | 192.168.0.10 | user3 |
| 2018-03-14 01:30 | 2018-03-14 18:00 | 192.168.0.10 | user1 |
| 2018-03-14 11:00 | 2018-03-14 13:50 | 192.168.0.10 | user10 |
| 2018-03-14 01:30 | 2018-03-14 01:30 | 192.168.0.150 | user1 |
| 2018-03-15 08:00 | 2018-03-15 12:20 | 192.168.0.170 | user1 |
| 2018-03-14 10:00 | 2018-03-14 17:30 | 192.168.0.20 | user2 |
----------------------------------------------------------------
最后要提及的是,IF OBJECT_ID('tempdb..#mtt') IS NULL
BEGIN
CREATE TABLE #mtt (
[Timestamp] datetime,
ClientIp varchar(45),
UsernameHash varchar(255)
);
END
DELETE FROM #mtt;
INSERT INTO #mtt([Timestamp], ClientIp, UsernameHash)
SELECT '2018-03-14 01:30', '192.168.0.10', 'user1'
UNION ALL
SELECT '2018-03-14 07:00', '192.168.0.10', 'user1'
UNION ALL
SELECT '2018-03-14 10:10', '192.168.0.10', 'user1'
UNION ALL
SELECT '2018-03-14 11:00', '192.168.0.10', 'user10'
UNION ALL
SELECT '2018-03-14 10:00', '192.168.0.20', 'user2'
UNION ALL
SELECT '2018-03-14 01:30', '192.168.0.150', 'user1'
UNION ALL
SELECT '2018-03-13 08:30', '192.168.0.10', 'user3'
UNION ALL
SELECT '2018-03-14 13:50', '192.168.0.10', 'user10'
UNION ALL
SELECT '2018-03-14 15:30', '192.168.0.20', 'user2'
UNION ALL
SELECT '2018-03-14 17:30', '192.168.0.20', 'user2'
UNION ALL
SELECT '2018-03-14 18:00', '192.168.0.10', 'user1'
UNION ALL
SELECT '2018-03-15 08:00', '192.168.0.170', 'user1'
UNION ALL
SELECT '2018-03-15 12:20', '192.168.0.170', 'user1';
和From
时间戳相同的输出记录(例如To
不是强制性的,但稍微偏好。
非常感谢任何实现此分组的想法,谢谢!
答案 0 :(得分:1)
这是一个" group-and-islands"问题。一个简单的解决方案使用row_number()
和聚合:
select user, ip, min(timestamp), max(timestamp)
from (select mtt.*,
row_number() over (partition by ip order by timestamp) as seqnum_t,
row_number() over (partition by ip, user order by timestamp) as seqnum_ut
from #mtt mtt
) mtt
group by ip, user, (seqnum_t - seqnum_ut);
为什么这个工作有点难以解释。但是,如果您运行子查询并盯着结果,您将看到两个序列号之间的差异标识了相邻记录的组。