Question

我可以使用传统的子查询方法来计算最近十分钟内的事件。例如，这个：

drop table if exists [dbo].[readings]
go

create table [dbo].[readings](
    [server] [int] NOT NULL,
    [sampled] [datetime] NOT NULL
)
go

insert into readings
values
(1,'20170101 08:00'),
(1,'20170101 08:02'),
(1,'20170101 08:05'),
(1,'20170101 08:30'),
(1,'20170101 08:31'),
(1,'20170101 08:37'),
(1,'20170101 08:40'),
(1,'20170101 08:41'),
(1,'20170101 09:07'),
(1,'20170101 09:08'),
(1,'20170101 09:09'),
(1,'20170101 09:11')
go

-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes
from readings r1
order by server,sampled
go

如何使用窗口函数获得相同的结果？我试过这个：

select server,sampled,
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
from readings r1
order by server,sampled

但结果只是运行计数。任何引用当前行指针的系统变量？ currentrow.sampled？

Answer 1

这不是一个非常令人愉快的答案，但有一种可能性是首先创建一个包含所有分钟的帮助表

CREATE TABLE #DateTimes(datetime datetime primary key);

WITH E1(N) AS 
(
    SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),
                            (1),(1),(1),(1),(1)) V(N)
)                                       -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b)   -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b)   -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b)   -- 1*10^8 or 100,000,000 rows
 ,R(StartRange, EndRange)
 AS (SELECT MIN(sampled),
            MAX(sampled)
     FROM   readings)
 ,N(N)
 AS (SELECT ROW_NUMBER()
              OVER (
                ORDER BY (SELECT NULL)) AS N
     FROM   E8)
INSERT INTO #DateTimes
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange)
FROM   N,
       R;

然后使用ROWS BETWEEN 9 PRECEDING AND CURRENT ROW

WITH T1 AS
( SELECT  Server,
                  MIN(sampled) AS StartRange,
                  MAX(sampled) AS EndRange
         FROM     readings
         GROUP BY Server )
SELECT      Server,
            sampled,
            Cnt
FROM        T1
CROSS APPLY
            ( SELECT   r.sampled,
                                COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt
                      FROM      #DateTimes N
                      LEFT JOIN readings r
                      ON        r.sampled = N.datetime
                                AND r.server = T1.server
                      WHERE     N.datetime BETWEEN StartRange AND EndRange ) CA
WHERE       CA.sampled IS NOT NULL
ORDER BY    sampled

以上假设每分钟最多有一个样本，并且所有时间都是精确的分钟。如果这不是真的，则需要另一个表格表达式按日期时间预聚合四舍五入到分钟。

Answer 2

据我所知，使用窗口函数不能简单地替换子查询。

窗口函数对一组行进行操作，允许您根据分区和顺序使用它们。您要做的不是我们可以在窗口函数中使用的分区类型。要生成分区，我们需要能够在这种情况下使用窗口函数，这只会导致代码过于复杂。

我建议cross apply()替代您的子查询。

我不确定您是否打算将结果限制在9分钟内，但sampled > dateadd(...)就是原始子查询中发生的情况。

以下是基于将样本分区为10分钟窗口以及cross apply()版本的窗口函数的样子。

select 
    r.server
  , r.sampled
  , CrossApply       = x.CountRecent
  , OriginalSubquery = (
      select count(*) 
      from readings s
      where s.server=r.server
        and s.sampled <= r.sampled
        /* doesn't include 10 minutes ago */
        and s.sampled > dateadd(minute,-10,r.sampled)
        )
  , Slices           = count(*) over(
      /* partition by server, 10 minute slices, not the same thing*/
      partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0)
      order by sampled
      )
from readings r
  cross apply (
    select CountRecent=count(*) 
    from readings i
    where i.server=r.server
      /* changed to >= */
      and i.sampled >= dateadd(minute,-10,r.sampled) 
      and i.sampled <= r.sampled 
     ) as x
order by server,sampled

结果：http://rextester.com/BMMF46402

+--------+---------------------+------------+------------------+--------+
| server |       sampled       | CrossApply | OriginalSubquery | Slices |
+--------+---------------------+------------+------------------+--------+
|      1 | 01.01.2017 08:00:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:02:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:05:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:30:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:31:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:37:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:40:00 |          4 |                3 |      1 |
|      1 | 01.01.2017 08:41:00 |          4 |                3 |      2 |
|      1 | 01.01.2017 09:07:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 09:08:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 09:09:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 09:11:00 |          4 |                4 |      1 |
+--------+---------------------+------------+------------------+--------+

Answer 3

感谢Martin和SqlZim，感谢您的回答。我将针对可以在窗口聚合中使用的%% currentrow之类的东西提出Connect增强请求。我认为这会导致更简单和自然的sql：

选择计数（采样时的情况＆lt; = %% currentrow.sampled和采样＆gt; dateadd（分钟，-10，%% currentrow.sampled）然后1或其他空结束）（...无论窗口是什么。 ..）

我们已经可以使用这样的表达式：

选择count（采样时的情况＆lt; = getdate（）和采样＆gt; dateadd（分钟，-10，getdate（））然后1其他null结束）（...无论窗口是什么......）< / p>

如果我们可以引用当前行中的列，那么思考会很棒。

窗口函数用于计算最近10分钟内的事件

3 个答案: