将时间序列按时间间隔(例如天)与持续时间的总和分组

时间:2018-11-09 11:47:17

标签: sql sql-server datetime group-by

我有一个表,其中包含一个具有以下信息的时间序列。每条记录都代表“更改模式”事件。

 Timestamp        | Mode 
------------------+------
 2018-01-01 12:00 |  1   
 2018-01-01 18:00 |  2   
 2018-01-02 01:00 |  1   
 2018-01-02 02:00 |  2   
 2018-01-04 04:00 |  1   

通过使用 LEAD 功能,我可以创建具有以下结果的查询。现在,每条记录都包含有关“模式已激活”的时间和时间的信息。

请检查第二和第四条记录。他们“属于”多天。

 StartDT          | EndDT            | Mode | Duration
------------------+------------------+------+----------
 2018-01-01 12:00 | 2018-01-01 18:00 |  1   |   6:00
 2018-01-01 18:00 | 2018-01-02 01:00 |  2   |   7:00
 2018-01-02 01:00 | 2018-01-02 02:00 |  1   |   1:00
 2018-01-02 02:00 | 2018-01-04 04:00 |  2   |  50:00
 2018-01-04 04:00 | (NULL)           |  1   | (NULL)

现在,我想查询一个按日期和方式对数据进行分组并汇总持续时间的查询。

此结果表是必需的:

 Date       | Mode | Total
------------+------+-------
 2018-01-01 |  1   |  6:00
 2018-01-01 |  2   |  6:00
 2018-01-02 |  1   |  1:00
 2018-01-02 |  2   | 23:00
 2018-01-03 |  2   | 24:00
 2018-01-04 |  2   | 04:00

我不知道如何处理“属于”多天的记录。有什么想法吗?

4 个答案:

答案 0 :(得分:2)

create table ChangeMode ( ModeStart datetime2(7), Mode int )

insert into ChangeMode ( ModeStart, Mode ) values
( '2018-11-15T21:00:00.0000000', 1 ),
( '2018-11-16T17:18:19.1231234', 2 ),
( '2018-11-16T18:00:00.5555555', 1 ),
( '2018-11-16T18:00:01.1234567', 2 ),
( '2018-11-16T19:02:22.8888888', 1 ),
( '2018-11-16T20:00:00.9876543', 2 ),
( '2018-11-17T09:00:00.0000000', 1 ),
( '2018-11-17T23:23:23.0230450', 2 ),
( '2018-11-19T17:00:00.0172839', 1 ),
( '2018-11-20T03:07:00.7033077', 2 )

;
with 
-- Determine the earliest and latest dates.
-- Cast to date to remove the time portion.
-- Cast results back to datetime because we're going to add hours later.
MinMaxDates 
as 
(select cast(min(cast(ModeStart as date))as datetime) as MinDate, 
        cast(max(cast(ModeStart as date))as datetime) as MaxDate from ChangeMode),

-- How many days have passed during that period
Dur
as
(select datediff(day,MinDate,MaxDate) as Duration from MinMaxDates),

-- Create a list of numbers.
-- These will be added to MinDate to get a list of dates.
NumList
as
( select 0 as Num
    union all
    select Num+1 from NumList,Dur where Num<Duration ),

-- Create a list of dates by adding those numbers to MinDate
DayList 
as
( select dateadd(day,Num,MinDate)as ModeDate from NumList, MinMaxDates  ),

-- Create a list of day periods
PeriodList
as
( select ModeDate as StartTime,
            dateadd(day,1,ModeDate) as EndTime
            from DayList                        ),

-- Use LEAD to get periods for each record
-- Final record would return NULL for ModeEnd
-- We replace that with end of last day
ModePeriodList
as
( select ModeStart, 
            coalesce( lead(ModeStart)over(order by ModeStart),
                    dateadd(day,1,MaxDate) ) as ModeEnd, 
            Mode from ChangeMode, MinMaxDates               ),

ModeDayList
as
( select * from ModePeriodList, PeriodList 
where ModeStart<=EndTime and ModeEnd>=StartTime
),

-- Keep the later   of the mode start time, and the day start time
-- Keep the earlier of the mode   end time, and the day   end time
ModeDayPeriod
as
( select case when ModeStart>=StartTime then ModeStart  else StartTime end as StartTime,
            case when ModeEnd<=EndTime  then ModeEnd else EndTime   end as EndTime,
            Mode from ModeDayList ),

SumDurations
as
( select cast(StartTime as date) as ModeDate, 
        Mode, 
        DateDiff_Big(nanosecond,StartTime,EndTime)
        /3600000000000 
            as DurationHours from ModeDayPeriod   )                        

-- List the results in order
-- Use MaxRecursion option in case there are more than 100 days 
select ModeDate as [Date], Mode, sum(DurationHours) as [Total Duration Hours]
     from SumDurations 
group by ModeDate, Mode
order by ModeDate, Mode
option (maxrecursion 0)

结果是:

Date       Mode        Total Duration Hours
---------- ----------- ---------------------------------------
2018-11-15 1           3.00000000000000
2018-11-16 1           18.26605271947221
2018-11-16 2           5.73394728052777
2018-11-17 1           14.38972862361111
2018-11-17 2           9.61027137638888
2018-11-18 2           24.00000000000000
2018-11-19 1           6.99999519891666
2018-11-19 2           17.00000480108333
2018-11-20 1           3.11686202991666
2018-11-20 2           20.88313797008333

答案 1 :(得分:1)

您可以使用CTE创建天数表,然后将时间段加入到表中

DECLARE @MAX as datetime2 = (SELECT MAX(CAST(Timestamp as date)) MX FROM process);
WITH StartEnd AS (select p1.Timestamp StartDT, 
                         P2.Timestamp  EndDT ,
                         p1.mode
                            from process p1
                            outer apply 
                            (SELECT TOP 1 pOP.*  FROM 
                                                    process pOP 
                                                    where pOP.Timestamp > p1.Timestamp 
                                                    order by pOP.Timestamp asc) P2
                 ),
    CAL AS (SELECT (SELECT MIN(cast(StartDT as date)) MN FROM StartEnd) DT
            UNION ALL
            SELECT DATEADD(day,1,DT) DT FROM CAL WHERE CAL.DT < @MAX
            ),
    TMS AS 
    (SELECT CASE WHEN S.StartDT > C.DT THEN S.StartDT ELSE C.DT END AS STP,
           CASE WHEN S.EndDT < DATEADD(day,1,C.DT) THEN S.ENDDT ELSE DATEADD(day,1,C.DT) END AS STE
     FROM StartEnd S JOIN CAL C ON NOT(S.EndDT <= C.DT OR S.StartDT>= DATEADD(day,1,C.dt))
    )
    SELECT *,datediff(MI ,TMS.STP, TMS.ste) as x from TMS

答案 2 :(得分:1)

以下内容使用递归CTE构建日期列表(日历或数字表效果很好)。然后,它将日期与日期时间相交,以便用匹配的数据填充缺少的日期。重要的一点是,对于每一行,如果开始日期时间属于前一天,则将其限制为00:00。结束日期时间也是如此。

DECLARE @t TABLE (timestamp DATETIME, mode INT);
INSERT INTO @t VALUES
('2018-01-01 12:00', 1),
('2018-01-01 18:00', 2),
('2018-01-02 01:00', 1),
('2018-01-02 02:00', 2),
('2018-01-04 04:00', 1);

WITH cte1 AS (
    -- the min and max dates in your data
    SELECT
        CAST(MIN(timestamp) AS DATE) AS mindate,
        CAST(MAX(timestamp) AS DATE) AS maxdate
    FROM @t
), cte2 AS (
    -- build all dates between min and max dates using recursive cte
    SELECT mindate AS day_start, DATEADD(DAY, 1, mindate) AS day_end, maxdate
    FROM cte1
    UNION ALL
    SELECT DATEADD(DAY, 1, day_start), DATEADD(DAY, 2, day_start), maxdate
    FROM cte2
    WHERE day_start < maxdate
), cte3 AS (
    -- pull end datetime from next row into current
    SELECT
        timestamp AS dt_start,
        LEAD(timestamp) OVER (ORDER BY timestamp) AS dt_end,
        mode
    FROM @t
), cte4 AS (
    -- join datetime with date using date overlap query
    -- then clamp start datetime to 00:00 of the date
    -- and clamp end datetime to 00:00 of next date
    SELECT 
        IIF(dt_start < day_start, day_start, dt_start) AS dt_start_fix, 
        IIF(dt_end > day_end, day_end, dt_end) AS dt_end_fix,
        mode
    FROM cte2
    INNER JOIN cte3 ON day_end > dt_start AND dt_end > day_start
)
SELECT dt_start_fix, dt_end_fix, mode, datediff(minute, dt_start_fix, dt_end_fix) / 60.0 AS total
FROM cte4

DB Fiddle

答案 3 :(得分:0)

谢谢大家!

Cato 的答案使我走上了正确的轨道。这是我的最终解决方案:

DECLARE @Start AS datetime;
DECLARE @End AS datetime;
DECLARE @Interval AS int;


SET @Start = '2018-01-01';
SET @End = '2018-01-05';
SET @Interval = 24 * 60 * 60;



WITH 

cteDurations AS 
    (SELECT [Timestamp] AS StartDT,
            LEAD ([Timestamp]) OVER (ORDER BY [Timestamp]) AS EndDT,
            Mode
     FROM tblLog
     WHERE [Timestamp] BETWEEN @Start AND @End
    ),

cteTimeslots AS
    (SELECT @Start AS StartDT,
            DATEADD(SECOND, @Interval, @Start) AS EndDT
     UNION ALL
     SELECT EndDT,
            DATEADD(SECOND, @Interval, EndDT)
     FROM cteTimeSlots WHERE StartDT < @End
    ),

cteDurationsPerTimesplot AS 
    (SELECT CASE WHEN S.StartDT > C.StartDT THEN S.StartDT ELSE C.StartDT END AS StartDT,
            CASE WHEN S.EndDT < C.EndDT THEN S.EndDT ELSE C.EndDT END AS EndDT,
            C.StartDT AS Slot,
            S.Mode
     FROM cteDurations S 
        JOIN cteTimeslots C ON NOT(S.EndDT <= C.StartDT OR S.StartDT >= C.EndDT)
    )


SELECT  Slot,
        Mode,
        SUM(DATEDIFF(SECOND, StartDT, EndDT)) AS Duration

FROM cteDurationsPerTimesplot
GROUP BY Slot, Mode
ORDER BY Slot, Mode;

使用变量 @Interval ,您可以定义时隙的大小。

CTE cteDurations 使用TSQL函数 LEAD (在MSSQL> = 2012中可用)创建带有所有必要条目持续时间的子结果。这将比OUTER APPLY快很多。

CTE cteTimeslots 生成包含开始时间和结束时间的时隙列表。

CTE cteDurationsPerTimesplot 是在cteDurations和cteTimeslots之间具有JOIN的子结果。这是 Cato 的神奇的JOIN语句!

最后,SELECT语句将按照插槽和模式进行分组和求和。

再次:非常感谢大家!特别是对卡托!你救了我的周末!

问候 奥利弗