窗口函数用于计算最近10分钟内的事件

时间:2017-02-11 12:20:59

标签: sql-server tsql window-functions

我可以使用传统的子查询方法来计算最近十分钟内的事件。例如,这个:

drop table if exists [dbo].[readings]
go

create table [dbo].[readings](
    [server] [int] NOT NULL,
    [sampled] [datetime] NOT NULL
)
go

insert into readings
values
(1,'20170101 08:00'),
(1,'20170101 08:02'),
(1,'20170101 08:05'),
(1,'20170101 08:30'),
(1,'20170101 08:31'),
(1,'20170101 08:37'),
(1,'20170101 08:40'),
(1,'20170101 08:41'),
(1,'20170101 09:07'),
(1,'20170101 09:08'),
(1,'20170101 09:09'),
(1,'20170101 09:11')
go

-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes
from readings r1
order by server,sampled
go

如何使用窗口函数获得相同的结果?我试过这个:

select server,sampled,
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
from readings r1
order by server,sampled

但结果只是运行计数。任何引用当前行指针的系统变量? currentrow.sampled?

3 个答案:

答案 0 :(得分:2)

这不是一个非常令人愉快的答案,但有一种可能性是首先创建一个包含所有分钟的帮助表

CREATE TABLE #DateTimes(datetime datetime primary key);

WITH E1(N) AS 
(
    SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),
                            (1),(1),(1),(1),(1)) V(N)
)                                       -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b)   -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b)   -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b)   -- 1*10^8 or 100,000,000 rows
 ,R(StartRange, EndRange)
 AS (SELECT MIN(sampled),
            MAX(sampled)
     FROM   readings)
 ,N(N)
 AS (SELECT ROW_NUMBER()
              OVER (
                ORDER BY (SELECT NULL)) AS N
     FROM   E8)
INSERT INTO #DateTimes
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange)
FROM   N,
       R;

然后使用ROWS BETWEEN 9 PRECEDING AND CURRENT ROW

WITH T1 AS
( SELECT  Server,
                  MIN(sampled) AS StartRange,
                  MAX(sampled) AS EndRange
         FROM     readings
         GROUP BY Server )
SELECT      Server,
            sampled,
            Cnt
FROM        T1
CROSS APPLY
            ( SELECT   r.sampled,
                                COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt
                      FROM      #DateTimes N
                      LEFT JOIN readings r
                      ON        r.sampled = N.datetime
                                AND r.server = T1.server
                      WHERE     N.datetime BETWEEN StartRange AND EndRange ) CA
WHERE       CA.sampled IS NOT NULL
ORDER BY    sampled

以上假设每分钟最多有一个样本,并且所有时间都是精确的分钟。如果这不是真的,则需要另一个表格表达式按日期时间预聚合四舍五入到分钟。

答案 1 :(得分:1)

据我所知,使用窗口函数不能简单地替换子查询。

窗口函数对一组行进行操作,允许您根据分区和顺序使用它们。 您要做的不是我们可以在窗口函数中使用的分区类型。 要生成分区,我们需要能够在这种情况下使用窗口函数,这只会导致代码过于复杂。

我建议cross apply()替代您的子查询。

我不确定您是否打算将结果限制在9分钟内,但sampled > dateadd(...)就是原始子查询中发生的情况。

以下是基于将样本分区为10分钟窗口以及cross apply()版本的窗口函数的样子。

select 
    r.server
  , r.sampled
  , CrossApply       = x.CountRecent
  , OriginalSubquery = (
      select count(*) 
      from readings s
      where s.server=r.server
        and s.sampled <= r.sampled
        /* doesn't include 10 minutes ago */
        and s.sampled > dateadd(minute,-10,r.sampled)
        )
  , Slices           = count(*) over(
      /* partition by server, 10 minute slices, not the same thing*/
      partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0)
      order by sampled
      )
from readings r
  cross apply (
    select CountRecent=count(*) 
    from readings i
    where i.server=r.server
      /* changed to >= */
      and i.sampled >= dateadd(minute,-10,r.sampled) 
      and i.sampled <= r.sampled 
     ) as x
order by server,sampled

结果:http://rextester.com/BMMF46402

+--------+---------------------+------------+------------------+--------+
| server |       sampled       | CrossApply | OriginalSubquery | Slices |
+--------+---------------------+------------+------------------+--------+
|      1 | 01.01.2017 08:00:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:02:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:05:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:30:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:31:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:37:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:40:00 |          4 |                3 |      1 |
|      1 | 01.01.2017 08:41:00 |          4 |                3 |      2 |
|      1 | 01.01.2017 09:07:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 09:08:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 09:09:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 09:11:00 |          4 |                4 |      1 |
+--------+---------------------+------------+------------------+--------+

答案 2 :(得分:0)

感谢Martin和SqlZim,感谢您的回答。我将针对可以在窗口聚合中使用的%% currentrow之类的东西提出Connect增强请求。我认为这会导致更简单和自然的sql:

选择计数(采样时的情况&lt; = %% currentrow.sampled和采样&gt; dateadd(分钟,-10,%% currentrow.sampled)然后1或其他空结束)(...无论窗口是什么。 ..)

我们已经可以使用这样的表达式:

选择count(采样时的情况&lt; = getdate()和采样&gt; dateadd(分钟,-10,getdate())然后1其他null结束)(...无论窗口是什么......)< / p>

如果我们可以引用当前行中的列,那么思考会很棒。