我有一个包含一系列(IP varchar(15),DateTime datetime2)值的表。每行对应于用户发出的HTTP请求。我想为这些行分配会话号。不同的IP地址具有不同的会话号。如果最后一个请求超过30分钟,则应为同一IP分配新会话号。这是一个示例输出:
IP, DateTime, SessionNumber, RequestNumber
1.1.1.1, 2012-01-01 00:01, 1, 1
1.1.1.1, 2012-01-01 00:02, 1, 2
1.1.1.1, 2012-01-01 00:03, 1, 3
1.1.1.2, 2012-01-01 00:04, 2, 1 --different IP => new session number
1.1.1.2, 2012-01-01 00:05, 2, 2
1.1.1.2, 2012-01-01 00:40, 3, 1 --same IP, but last request 35min ago (> 30min)
第1列和第2列是输入,3和4是所需的输出。该表显示了两个用户。
由于底层是表格真的很大,如何有效地解决这个问题?我更喜欢在数据上传递少量的传递(一两次)。
答案 0 :(得分:8)
这里有几次尝试。
;WITH CTE1 AS
(
SELECT *,
IIF(DATEDIFF(MINUTE,
LAG(DateTime) OVER (PARTITION BY IP ORDER BY DateTime),
DateTime) < 30,0,1) AS SessionFlag
FROM Sessions
), CTE2 AS
(
SELECT *,
SUM(SessionFlag) OVER (PARTITION BY IP
ORDER BY DateTime) AS IPSessionNumber
FROM CTE1
)
SELECT IP,
DateTime,
DENSE_RANK() OVER (ORDER BY IP, IPSessionNumber) AS SessionNumber,
ROW_NUMBER() OVER (PARTITION BY IP, IPSessionNumber
ORDER BY DateTime) AS RequestNumber
FROM CTE2
这有两个排序操作(按IP, DateTime
然后按IP, IPSessionNumber
),但假设可以任意分配SessionNumber
,只要为每个新的分配不同的唯一会话编号每个ip地址/ 30分钟规则的会话。
按时间顺序依次分配SessionNumber
s。我使用了以下内容。
;WITH CTE1 AS
(
SELECT *,
IIF(DATEDIFF(MINUTE,
LAG(DateTime) OVER (PARTITION BY IP ORDER BY DateTime),
DateTime) < 30,0,1) AS SessionFlag
FROM Sessions
), CTE2 AS(
SELECT *,
SUM(SessionFlag) OVER (ORDER BY DateTime) AS GlobalSessionNo
FROM CTE1
), CTE3 AS(
SELECT *,
MAX(CASE WHEN SessionFlag = 1 THEN GlobalSessionNo END)
OVER (PARTITION BY IP ORDER BY DateTime) AS SessionNumber
FROM CTE2)
SELECT IP,
DateTime,
SessionNumber,
ROW_NUMBER() OVER (PARTITION BY SessionNumber
ORDER BY DateTime) AS RequestNumber
FROM CTE3
然而,这会将排序操作的数量增加到4个。
答案 1 :(得分:2)
这是一个使用表变量和row_number来创建可以在递归CTE中使用的ID的版本。将性能与游标和一个查询(由Martin提供)版本进行比较可能是值得的。
CREATE TABLE #T
(
IP varchar(15),
DateTime datetime,
ID int,
primary key (IP, ID)
)
insert into #T(IP, DateTime, ID)
select IP, DateTime, row_number() over(partition by IP order by DateTime)
from #sessionRequests
;with C as
(
select IP,
ID,
DateTime,
1 as Session
from #T
where ID = 1
union all
select T.IP,
T.ID,
T.DateTime,
C.Session + case when datediff(minute, C.DateTime, T.DateTime) >= 30 then 1 else 0 end
from #T as T
inner join C
on T.IP = C.IP and
T.ID = C.ID + 1
)
SELECT IP,
DateTime,
dense_rank() over(order by IP, Session) as SessionNumber,
row_number() over(partition by IP, Session order by DateTime) as RequestNumber
from C
order by IP, DateTime, SessionNumber, RequestNumber
option (maxrecursion 0)