从日期时间表中获取包含起始值和结束值的列表

时间:2011-05-16 13:51:33

标签: sql sql-server-2008 gaps-and-islands

目前我有一个像这样建立的表格

DeviceID      Timestamp            Value
----------------------------------------
Device1       1.1.2011 10:00:00    3
Device1       1.1.2011 10:00:01    4
Device1       1.1.2011 10:00:02    4
Device1       1.1.2011 10:00:04    3
Device1       1.1.2011 10:00:05    4
Device1       1.1.2011 14:23:14    8
Device1       1.1.2011 14:23:15    7
Device1       1.1.2011 14:23:17    4
Device1       1.1.2011 14:23:18    2

如您所见,来自具有给定时间戳的设备的某些值(列类型为datetime)。

问题是设备可以在任何时刻启动和停止,并且数据中没有发生启动或停止的直接信息。但是从给定的时间戳列表中可以很容易地判断出启动和停止的时间,因为只要两行的时间戳在5秒内,它们就属于相同的测量值。

现在我想从这个数据中删除这样一个列表:

DeviceID      Started              Ended
Device1       1.1.2011 10:00:00    1.1.2011 10:00:05
Device1       1.1.2011 14:23:14    1.1.2011 14:23:18

那么任何想法如何以快速的方式做到这一点?我能想到的只是使用某种光标并手动比较每个日期时间对。但我认为这将变得非常慢,因为我们必须检查每一行中的每个值。

那么有没有更好的SQL解决方案不适用于游标?

更新

目前我测试了所有给出的答案。通过阅读他们看起来都很好,并有一些有趣的方法。不幸的是,所有这些(到目前为止)在真实数据上都失败了。最大的问题似乎是数据的质量(目前它们在表中约为350万条)。仅在一小部分上执行给定查询会产生预期结果,但将查询滚动到整个表上只会导致非常糟糕的性能。

我必须进一步测试并检查我是否可以对数据进行分块并仅将部分数据传递给这些给定算法之一以使这个事情滚动。但也许你们其中一个人有另一个聪明的想法,可以让结果更快一点。

更新(有关结构的更多信息)

好的,这些信息也可能会有所帮助: 目前,该表中有大约350万条参赛作品。这里是给定的列类型和indizes:

  • _ID
    • INT
    • 主键
    • 分组索引
    • 在我的示例中没有提到此列,因为此查询不需要它
  • 的DeviceID
    • INT
    • not null
    • 索引
  • 时间戳
    • 日期时间
    • not null
    • 索引
    • 几个不带索引的不同类型的列(int,real,tinyint)
    • all可以为null

这可能有助于改善已经(或新)解决方案的问题。

7 个答案:

答案 0 :(得分:2)

-- Table var to store the gaps
declare @T table
(
  DeviceID varchar(10),
  PrevPeriodEnd datetime,
  NextPeriodStart datetime
)

-- Get the gaps
;with cte as 
(
  select *,
    row_number() over(partition by DeviceID order by Timestamp) as rn
  from data
)
insert into @T
select
  C1.DeviceID,
  C1.Timestamp as PrevPeriodEnd,
  C2.Timestamp as NextPeriodStart
from cte as C1
  inner join cte as C2
    on C1.rn = C2.rn-1 and
       C1.DeviceID = C2.DeviceID and
       datediff(s, C1.Timestamp, C2.Timestamp) > 5

-- Build islands from gaps in @T
;with cte1 as
(
  -- Add first and last timestamp to gaps
  select DeviceID, PrevPeriodEnd, NextPeriodStart
  from @T
  union all
  select DeviceID, max(TimeStamp) as PrevPeriodEnd, null as NextPeriodStart
  from data
  group by DeviceID
  union all
  select DeviceID, null as PrevPeriodEnd, min(TimeStamp) as PrevPeriodEnd
  from data
  group by DeviceID
),
cte2 as
(
  select *,
    row_number() over(partition by DeviceID order by PrevPeriodEnd) as rn
  from cte1
)
select
  C1.DeviceID,
  C1.NextPeriodStart as PeriodStart,
  C2.PrevPeriodEnd as PeriodEnd
from cte2 as C1
  inner join cte2 as C2
    on C1.DeviceID = C2.DeviceID and
       C1.rn = C2.rn-1
order by C1.DeviceID, C1.NextPeriodStart       

答案 1 :(得分:0)

试试这个:

select DeviceID,MIN(Timestamp),MAX(Timestamp) 
          from @table group by DATEPART(hh,Timestamp),DeviceID

答案 2 :(得分:0)

我玩过一些数据类型和名称(仅因为我可以,因为时间戳是一个保留字),并且可以使用您的示例数据获取您请求的结果。

示例数据:

create table Measures (
    DeviceID int not null,
    Occurred datetime not null,
    Value int not null,
    constraint PK_Measures PRIMARY KEY (DeviceID,Occurred)
)
go
insert into Measures (DeviceID,Occurred,Value)
select 1,'2011-01-01T10:00:00',3 union all
select 1,'2011-01-01T10:00:01',4 union all
select 1,'2011-01-01T10:00:02',4 union all
select 1,'2011-01-01T10:00:04',3 union all
select 1,'2011-01-01T10:00:05',4 union all
select 1,'2011-01-01T14:23:14',8 union all
select 1,'2011-01-01T14:23:15',7 union all
select 1,'2011-01-01T14:23:17',4 union all
select 1,'2011-01-01T14:23:18',2

现在查询:

;with StartPeriods as (
    select m1.DeviceID,m1.Occurred as Started
    from Measures m1 left join Measures m2 on m1.DeviceID = m2.DeviceID and m2.Occurred < m1.Occurred and DATEDIFF(second,m2.Occurred,m1.Occurred) < 6
    where m2.DeviceID is null
), ExtendPeriods as (
    select DeviceID,Started,Started as Ended from StartPeriods
    union all
    select
        ep.DeviceID,ep.Started,m2.Occurred
    from
        ExtendPeriods ep
            inner join
        Measures m2
            on
                ep.DeviceID = m2.DeviceID and
                ep.Ended < m2.Occurred and
                DATEDIFF(SECOND,ep.Ended,m2.Occurred) < 6
)
select DeviceID,Started,MAX(Ended) from ExtendPeriods group by DeviceID,Started

StartPeriods公用表表达式(CTE)从Measures表中查找那些在5秒内没有前一行的行。 ExtendPeriods CTE然后递归地扩展这些时段,方法是在找到的时段的当前结束后最多5秒内从Measures中查找新行。

然后我们找到期末尽可能远离开头的行。

答案 3 :(得分:0)

DECLARE @t TABLE
(DeviceID      VARCHAR(10),
 [Timestamp]    DATETIME,
 VALUE          INT
)

INSERT @t
SELECT 'Device1','20110101 10:00:00',    3
UNION SELECT 'Device1','20110101 10:00:01',    4
UNION SELECT 'Device1','20110101 10:00:02',    4
UNION SELECT 'Device1','20110101 10:00:04',   3
UNION SELECT 'Device1','20110101 10:00:05',    4
UNION SELECT 'Device1','20110101 14:23:14',    8
UNION SELECT 'Device1','20110101 14:23:15',    7
UNION SELECT 'Device1','20110101 14:23:17',    4
UNION SELECT 'Device1','20110101 14:23:18',    2


;WITH myCTE
AS
(
    SELECT DeviceID, [Timestamp],
           ROW_NUMBER() OVER (PARTITION BY DeviceID
                              ORDER BY [TIMESTAMP]
                             ) AS rn
    FROM @t
)
, recCTE
AS
(
    SELECT DeviceID, [Timestamp],  0 as groupID, rn FROM myCTE
    WHERE rn = 1

    UNION ALL

    SELECT r.DeviceID, g.[Timestamp],  CASE WHEN DATEDIFF(ss,r.[Timestamp], g.[Timestamp]) <= 5 THEN r.groupID ELSE r.groupID + 1 END, g.rn 
    FROM recCTE AS r
    JOIN myCTE AS g
    ON g.rn = r.rn + 1
)
SELECT DeviceID, MIN([Timestamp]) AS [started], MAX([Timestamp]) AS ended
FROM recCTE
GROUP BY DeviceId, groupId
OPTION (MAXRECURSION 0);

答案 4 :(得分:0)

你应该可以使用窗口函数(假设15分钟定义了下面的新会话):

SELECT DeviceId,
       Timestamp,
       COALESCE((Timestamp - lag(Timestamp) OVER w) > interval '15 min', TRUE)
       as session_begins
       COALESCE((lead(Timestamp) OVER w - Timestamp) > interval '15 min', TRUE)
       as session_ends
FROM YourTable
WINDOW w AS (PARTITION BY DeviceId ORDER BY Timestamp);

根据您的where子句,您可能希望删除coalesce / true部分,因为提取的第一行/最后一行可能无效。

如果只需要边界,则可以在子查询和group by DeviceId, session_begins, session_ends having session_begins or session_ends中使用上述内容。另外,如果你这样做,不要忘记将where子句放在子查询中而不是主子查询中,否则由于窗口聚合,你最终会对整个表进行seq扫描。

答案 5 :(得分:0)

以下解决方案的基本思想来自this answer

WITH data (DeviceID, Timestamp, Value) AS (
  SELECT 'Device1', CAST('1.1.2011 10:00:00' AS datetime), 3 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:01',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:02',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:04',              3 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:05',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:14',              8 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:15',              7 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:17',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:18',              2
),
ranked AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY DeviceID ORDER BY Timestamp)
  FROM data
),
starts AS (
  SELECT
    r1.DeviceID,
    r1.Timestamp,
    rank = ROW_NUMBER() OVER (PARTITION BY r1.DeviceID ORDER BY r1.Timestamp)
  FROM ranked r1
    LEFT JOIN ranked r2 ON r1.DeviceID = r2.DeviceID
      AND r1.rn = r2.rn + 1
      AND r1.Timestamp <= DATEADD(second, 5, r2.Timestamp)
  WHERE r2.DeviceID IS NULL
),
ends AS (
  SELECT
    r1.DeviceID,
    r1.Timestamp,
    rank = ROW_NUMBER() OVER (PARTITION BY r1.DeviceID ORDER BY r1.Timestamp)
  FROM ranked r1
    LEFT JOIN ranked r2 ON r1.DeviceID = r2.DeviceID
      AND r1.rn = r2.rn - 1
      AND r1.Timestamp >= DATEADD(second, -5, r2.Timestamp)
  WHERE r2.DeviceID IS NULL
)
SELECT
  s.DeviceID,
  Started = s.Timestamp,
  Ended = e.Timestamp
FROM starts s
  INNER JOIN ends e ON s.DeviceID = e.DeviceID AND s.rank = e.rank

答案 6 :(得分:0)

试试这个,虽然我不确定它会用大量数据执行得多好

SELECT a.TS AS [StartTime], (SELECT TOP 1 c.TS FROM TestTime c WHERE c.TS >= a.TS AND
    NOT EXISTS(SELECT * FROM TestTime d WHERE d.TS > c.TS AND DATEDIFF(SECOND, c.TS, d.TS) <= 5) ORDER BY c.TS) AS [StopTime]
FROM TestTime a WHERE NOT EXISTS (SELECT * FROM TestTime b WHERE a.TS > b.TS AND DATEDIFF(SECOND, b.TS, a.TS) <= 5)

我的表名为TestTime,该列称为TS,因此请为您的表调整它。我使用NOT EXISTS来检查时间戳&lt;当前记录,并在5秒内显示 - 如果没有找到显示,即开始时间(或表中的第一个记录,然后它将寻找比找到的任何记录>&=的那个最大的时间戳。 timestamp(如果它是单个条目,所以是一个开始/停止)并且再次使用NOT EXISTS来检查大于它的记录并且在5秒内 - 所以,再次显示是否找不到记录(只有第一个。你可以调整并改进它,但它可能是一个很好的基础。

请注意,如果它仍在运行,它会列出最后一次作为上次启动事件的停止时间。

为简单起见,我没有在这里放置设备名称,因此您需要将其放在StopTime和WHERE子句中