如何在30分钟窗口中选择时间戳最高的行?

时间:2016-09-30 16:00:08

标签: sql sql-server

我有一个名为 table1 的SQL Server表,它有一个时间戳列 column_ts ,还有一些列说 column1,column2,column3

所以表格如下:

column_ts                   column1     column2     column3
2016-09-30 00:04:00.000     number1     string1     integer1
2016-09-30 00:24:00.000     number2     string2     integer2
2016-09-30 00:29:00.000     number3     string3     integer3
2016-09-30 00:44:00.000     number4     string4     integer4
2016-09-30 00:48:00.000     number5     string5     integer5
2016-09-30 01:04:00.000     number6     string6     integer6
2016-09-30 01:24:00.000     number7     string7     integer7
2016-09-30 01:54:00.000     number8     string8     integer8
2016-09-30 01:59:00.000     number9     string9     integer9

首先,我将选择记录where column_ts >= 2016-09-30 00:00:00.000。然后,我想从 column_ts 的每个30分钟窗口中只选择一行具有最高时间戳的行。

因此,对于给定的数据,查询应仅选择以下行:

column_ts                   column1     column2     column3
2016-09-18 00:29:00.000     number3     string3     integer3
2016-09-18 00:48:00.000     number5     string5     integer5
2016-09-18 01:24:00.000     number7     string7     integer7
2016-09-18 01:59:00.000     number9     string9     integer9

在某种程度上,我想制作 column_ts 的30分钟窗口,如

1)2016-09-30 00:00:00.000 - 2016-09-30 00:30:00.000
2)2016-09-30 00:30:00.000 - 2016-09-30 01:00:00.000
3)2016-09-30 01:00:00.000 - 2016-09-30 01:30:00.000
4)2016-09-30 01:30:00.000 - 2016-09-30 02:00:00.000

最后想从这些30分钟的窗口中选择一行,其中 column_ts 的值最高。

我无法弄清楚如何生成30分钟的窗口,我可以从中选择MAX(column_ts)。请建议我如何做到这一点。

7 个答案:

答案 0 :(得分:3)

你可以从一个纪元中取出以分钟为单位的日期差异,然后将其除以30分组,间隔30分钟。

此查询将为每个30分钟的插槽以及该插槽的最大column_ts提供:

select dateadd(minute, datediff(minute, '1970-1-1',column_ts)/30*30,'1970-1-1') as timegroup,
       MAX(column_ts) as max_time
from table1 where column_ts >= '2016-09-30 00:00:00.000'
group by datediff(minute, '1970-1-1', column_ts) / 30

以上产生:

timegroup                   max_time
2016-09-30 00:00:00.000     2016-09-30 00:29:00.000
2016-09-30 00:30:00.000     2016-09-30 00:48:00.000
2016-09-30 01:00:00.000     2016-09-30 01:24:00.000
2016-09-30 01:30:00.000     2016-09-30 01:59:00.000

完成后,您可以在子查询中使用它来获取您所追求的结果:

select groups.timegroup, t.column_ts, t.column1, t.column2, t.column3 
from (
    select dateadd(minute, datediff(minute, '1970-1-1',column_ts)/30*30,'1970-1-1') as timegroup,MAX(column_ts) as max_time
    from table1 where column_ts >= '2016-09-30 00:00:00.000'
    group by datediff(minute, '1970-1-1', column_ts) / 30
) as groups
inner join table1 t on t.column_ts = groups.max_time

哪个产生

timegroup                   column_ts                   column1   column2   column3
2016-09-30 00:00:00.000     2016-09-30 00:29:00.000     number3   string3   integer3
2016-09-30 00:30:00.000     2016-09-30 00:48:00.000     number5   string5   integer5
2016-09-30 01:00:00.000     2016-09-30 01:24:00.000     number7   string7   integer7
2016-09-30 01:30:00.000     2016-09-30 01:59:00.000     number9   string9   integer9

答案 1 :(得分:2)

假设您使用的是sql server 2005+,这是脚本

use tempdb
--drop table dbo.t
create table dbo.t (column_ts datetime, column1 varchar(30), column2 varchar(30), column3 varchar(30));
go
-- populate the table
insert into dbo.t (column_ts, column1, column2, column3)
select '2016-09-30 00:04:00.000','number1','string1','integer1'
union all select '2016-09-30 00:24:00.000','number2','string2','integer2'
union all select '2016-09-30 00:29:00.000','number3','string3','integer3'
union all select '2016-09-30 00:44:00.000','number4','string4','integer4'
union all select '2016-09-30 00:48:00.000','number5','string5','integer5'
union all select '2016-09-30 01:04:00.000','number6','string6','integer6'
union all select '2016-09-30 01:24:00.000','number7','string7','integer7'
union all select '2016-09-30 01:54:00.000','number8','string8','integer8'
union all select '2016-09-30 01:59:00.000','number9','string9','integer9';
go

-- the query
; with c as (
select section=datediff(minute, '2016-09-30', column_ts)/30, * from dbo.t
)
, c2 as (select rnk=rank() over (partition by section order by column_ts desc), * from c)
select column_ts, column1, column2, column3
from c2 
where rnk = 1;

在我收集性能跟踪后,我需要在每30分钟窗口找到最昂贵的查询之前,我做了类似的事情。

答案 2 :(得分:1)

我会生成一个间隔表,并将其连接到您的数据。然后为row_number()按照降序排列的每个区间添加column_ts,仅返回最高值(RN = 1)。

DECLARE @Test TABLE (column_ts datetime, column1 varchar(50), column2 varchar(50), column3 varchar(50))
INSERT INTO @Test
VALUES ('2016-09-30 00:04:00.000','number1','string1','integer1'),
       ('2016-09-30 00:24:00.000','number2','string2','integer2'),
       ('2016-09-30 00:29:00.000','number3','string3','integer3'),
       ('2016-09-30 00:44:00.000','number4','string4','integer4'),
       ('2016-09-30 00:48:00.000','number5','string5','integer5'),
       ('2016-09-30 01:04:00.000','number6','string6','integer6'),
       ('2016-09-30 01:24:00.000','number7','string7','integer7'),
       ('2016-09-30 01:54:00.000','number8','string8','integer8'),
       ('2016-09-30 01:59:00.000','number9','string9','integer9')

DECLARE @TimeGrid TABLE (IntervalStart TIME, IntervalEnd TIME)

DECLARE @MyTime TIME, @true BIT=1

WHILE @true=1
BEGIN
    IF @MyTime IS NULL SET @MyTime = CONVERT(TIME,'00:00:00')

    INSERT INTO @TimeGrid (IntervalStart,IntervalEnd)
    SELECT @MyTime, DATEADD(NS,-100,DATEADD(MI,30,@MyTime))

    SET @MyTime=DATEADD(MI,30,@MyTime)
    IF @MyTime= CONVERT(TIME,'00:00:00')
        SET @true=0
END

;WITH X AS
(
    SELECT * 
    FROM @Test T
    JOIN @TimeGrid TG ON CONVERT(TIME,T.column_ts) BETWEEN TG.IntervalStart AND TG.IntervalEnd
), Y AS
    (
        SELECT *,
               ROW_NUMBER() OVER(PARTITION BY IntervalStart ORDER BY column_ts DESC) AS RN
        FROM X
    )

SELECT column_ts, column1, column2, column3--, IntervalStart, IntervalEnd, RN
FROM Y
WHERE RN=1

答案 3 :(得分:1)

;WITH cte AS (
    SELECT
       *
       ,ROW_NUMBER() OVER (PARTITION BY
                CASE
                    WHEN DATEPART(MINUTE,column_ts) > 30 THEN DATEADD(MINUTE,30 - DATEPART(MINUTE,column_ts),column_ts)
                    ELSE DATEADD(MINUTE,- DATEPART(MINUTE,column_ts),column_ts)
                END
             ORDER BY column_ts DESC) as RowNumber
    FROM
       @Table1
)

SELECT *
FROM
    cte
WHERE
    RowNumber = 1

您可以像其他人一样显示每30分钟生成一张表格,但实际情况是,如果不到30分钟,您只需要向下舍入到小时标记,如果超过30分钟,则需要舍入到30分钟。这将创建分组。所以不需要递归cte。

CASE
     WHEN DATEPART(MINUTE,column_ts) => 30 THEN DATEADD(MINUTE,30 - DATEPART(MINUTE,column_ts),column_ts)
     ELSE DATEADD(MINUTE,- DATEPART(MINUTE,column_ts),column_ts)
END as HalfHourGroup

答案 4 :(得分:1)

@ petelids的答案看起来对我而言,但我会提供一种在计算中不使用文字日期的替代方案。我想你甚至可能认为它看起来更清晰一些。根据您的样本数据我假设您没有存储秒数。您也可以通过一些格式化选项忽略输出中的秒数。对于<title>Test Site</title> <body> <div id="headerpanel"> TEST </div> </body>,无论如何,秒数都无关紧要。

span {
  display: inline-block;
  font-weight: bold;
  margin-right: 6px;
  vertical-align:middle; /* added */
}

ul {
  display: inline-block;
  list-style: none;
  list-style-type: none;
  margin: 0;
  padding: 0;
  vertical-align:middle;  /* added */
}

ul li {
  display: inline-block;
  list-style: none;
  list-style-type: none;
  margin: 0;
  padding: 4px;
}

ul li {
  font-size: 2.0rem;
}

修改 在重新阅读您的问题后,我意识到您希望整行作为结果。你仍然可以使用这种方法,尽管group by技术现在可能更常见并且可能非常快。

select
    dateadd
        minute,
        -datepart(minute, min(column_ts)) % 30,
        min(column_ts)
    ) as timegroup,
    max(column_ts) as max_time_in_window
from T
group by
    cast(column_ts as date),
    datepart(hour, column_ts),
    datepart(minute, column_ts) / 30;

或使用row_number()

select * from T
where column_ts in (
    select max(column_ts) as max_time_in_window
    from T
    group by
        cast(column_ts as date),
        datepart(hour, column_ts),
        datepart(minute, column_ts) / 30
);

答案 5 :(得分:0)

可以在没有窗口函数的情况下完成:

select max(column_ts) column_ts, column1, column2, column3
from mytable
where column_ts >= 2016-09-30 00:00:00.000
group by column1, column2, column3

要在多个时间段内获得结果,请按括号分组:

select max(column_ts) column_ts, column1, column2, column3
from mytable
group by column1, column2, column3, <expression to calculate a unique value for each column_ts bracket>

答案 6 :(得分:0)

我通过分别生成“间隔”表作为CTE来做到这一点。如果你这么做很多,你可能想要“保持”表中的间隔,以便你可以加入它们。当有两个具有相同时间戳的事件时,您还应该考虑一下您想要发生什么......

{{1}}

(警告:明天剧本可能无效......)