让我先说明这一点,我不确定如何首先提出这个问题,这个问题一直是试图寻找答案的一个重大障碍。结果,我可能会使用完全错误的术语。
我希望使用窗口在一段时间内获得不同用户的数量。
我的数据表包含以下列:Id,User,RequestedOn,Query查询系统随时间捕获请求的位置。例如,在8个小时的过程中,78个不同的用户会对系统进行370次不同的查询。
我想通过蛮力和忽略(BF& I)来解决这个问题,但是就像许多BF& I方法一样,它不会扩展价值豆。
在这些例子中,计数的窗口大小是8小时;给定8小时时段的不同用户数。
Select '5/28/17 15:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 15:00' And [RequestedOn] <= '5/28/17 23:00' Union
Select '5/28/17 14:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 14:00' And [RequestedOn] <= '5/28/17 22:00' Union
Select '5/28/17 13:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 13:00' And [RequestedOn] <= '5/28/17 21:00' Union
Select '5/28/17 12:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 12:00' And [RequestedOn] <= '5/28/17 20:00' Union
Select '5/28/17 11:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 11:00' And [RequestedOn] <= '5/28/17 19:00' Union
Select '5/28/17 10:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 10:00' And [RequestedOn] <= '5/28/17 18:00' Union
Select '5/28/17 09:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 09:00' And [RequestedOn] <= '5/28/17 17:00' Union
Select '5/28/17 08:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 08:00' And [RequestedOn] <= '5/28/17 16:00'
我认为必须有一个更好的方法,但我不知道从哪里开始寻找。
指针太棒了!
答案 0 :(得分:1)
如果我理解正确,您需要recursive cte
这样的
DECLARE @StartTime datetime = '2017-05-28 00:00:00'
DECLARE @EndTime datetime = '2017-05-29 00:00:00'
;WITH cte AS
(
SELECT @StartTime AS StartPeriod, dateadd(hour,8,@StartTime) AS EndPeriod
UNION ALL
SELECT dateadd(hour,1,StartPeriod), dateadd(hour,1,EndPeriod) AS EndPeriod
FROM cte
WHERE cte.StartPeriod < @EndTime
)
-- cte returns
--StartPeriod EndPeriod
--2017-05-28 00:00:00.000 2017-05-28 08:00:00.000
--2017-05-28 01:00:00.000 2017-05-28 09:00:00.000
--2017-05-28 02:00:00.000 2017-05-28 10:00:00.000
--2017-05-28 03:00:00.000 2017-05-28 11:00:00.000
--2017-05-28 04:00:00.000 2017-05-28 12:00:00.000
--2017-05-28 05:00:00.000 2017-05-28 13:00:00.000
--.................
SELECT c.StartPeriod, c.EndPeriod, Users FROM cte c
OUTER APPLY (
SELECT Count(Distinct [UserName]) AS Users -- i think you should use Count(distinct UserId) instead of UserName
From [vwRequests] Where [RequestedOn] BETWEEN c.StartPeriod AND c.EndPeriod
) ca
OPTION (MAXRECURSION 0)
答案 1 :(得分:1)
如果要优化现有查询的性能而不进行太多更改,请将UNION
替换为UNION ALL
,并在Username和RequestedOn列上添加一些索引。
如果vwRequests
是一个表格(不是一个视图),请尝试这些以查看哪种方式最适合您:
CREATE INDEX IX1 ON dbo.vwRequests (RequestedOn, Username)
CREATE INDEX IX2 ON dbo.vwRequests (Username, RequestedOn)
如果vwRequests
是视图,您可以尝试在基表上添加索引或将视图更改为索引视图。
如果你想重写你的查询,你可以从做这样的事情开始:
SELECT x1.StartingFrom, x2.Users
FROM (VALUES (8),(9),(10),(11),(12),(13),(14),(15)) h (h)
CROSS APPLY (
SELECT DATEADD(HOUR,h,'20170528') AS [StartingFrom]
) x1
CROSS APPLY (
SELECT COUNT(DISTINCT vr.Username) AS Users
FROM dbo.vwRequests vr
WHERE vr.RequestedOn BETWEEN x1.StartingFrom AND DATEADD(HOUR,8,x1.StartingFrom)
) x2