Question

我有一个应用程序，如果该应用程序是“在线的”，则每30秒创建一个新行（在上面的第一张表中）。如果每行之间的间隔大于30秒，则应用程序为“离线”。

这是表格：

----Date---- 
2018-07-05 15:02:41.903
2018-07-05 15:04:05.907
2018-07-05 15:06:10.433
2018-07-05 15:06:40.433
2018-07-05 15:07:40.430
2018-07-05 15:07:10.430

我想创建一个表，其中显示UpTime（状态= 1）和DownTime（状态= 0）以及每个期间的开始和结束日期。

我已经设法做到这一点：

----Date---------------------Difference-------------Status- 
2018-07-05 15:02:41.903           30                  1
2018-07-05 15:04:05.907           84                  0
2018-07-05 15:06:10.433          125                  0
2018-07-05 15:06:40.433           30                  1
2018-07-05 15:07:10.430           30                  1
2018-07-05 15:07:40.430           30                  1

使用以下代码：

  WITH    rows AS
    (
    SELECT  *, ROW_NUMBER() OVER (ORDER BY [LAST_UPDATE]) AS rn
    FROM    [dbo].[X] 
    )


SELECT  [LAST_UPDATE],
    DATEDIFF(second, pDataDate, [LAST_UPDATE]) as Differance,
    case when (DATEDIFF(second, pDataDate, [LAST_UPDATE])-30) > 1 then 0 else 
1 end as DownOrUp 
FROM    (
    SELECT  *,
            LAG([LAST_UPDATE]) OVER (ORDER BY [LAST_UPDATE]) pDataDate
    FROM    rows
    ) q
WHERE   pDataDate IS NOT NULL

下表是我想要的（持续时间近似计算）。停机时间为：状态为1的上一个时间戳到状态= 0的上一个时间戳。

------Status-----------StartDate-------------------EndDate---------Duration--
       Down    2018-07-05 15:02:41.903  2018-07-05 15:06:10.433      270
         UP    2018-07-05 15:06:40.433  2018-07-05 15:07:40.430       60

有什么建议吗？

Answer 1

我认为我有一些适合您的东西。我不确定大型数据集的效率如何，并且我敢肯定可以将其分解为更少的步骤，但是希望至少可以解释这个想法。首先，让我们设置示例数据，将每个时间戳与其之前的时间戳分组，然后找出哪个间隔代表正常运行时间与停机时间。

declare @x table (last_update datetime);
insert @x values
    ('2018-07-05 15:02:41.903'),
    ('2018-07-05 15:04:05.907'),
    ('2018-07-05 15:06:10.433'),
    ('2018-07-05 15:06:40.433'),
    ('2018-07-05 15:07:40.430'),
    ('2018-07-05 15:07:10.430');

select
    start_time = lag(X.last_update) over (order by X.last_update),
    end_time = X.last_update,
    up = case when coalesce(datediff(second, lag(X.last_update) over (order by X.last_update), X.last_update), 0) <= 30 then 1 else 0 end,
    number = row_number() over (order by X.last_update)
from
    @x X;

我可能不需要在这里解释太多，因为它与您已经获得的结果非常接近。我所做的就是添加一个number列，其值随着时间戳的增加而增加；我这样做的理由将在稍后阐明。我们需要回答的下一个问题是：这些间隔中的哪个代表前一个间隔的延续，以及哪个代表新间隔的开始？从您的问题中得出的结论是，您认为大于30秒的连续时间代表连续的停机时间，而小于30秒的连续时间代表连续的停机时间。因此，我要做的是将我们已有的查询放入CTE中，然后再次使用lag函数来确定每个间隔是否具有与其前身相同的状态。这将产生一个名为continues的列，对于表示先前间隔的延续的间隔，该列为1；对于表示新周期开始的间隔，该列为0。

with Intervals as
(
    select
        start_time = lag(X.last_update) over (order by X.last_update),
        end_time = X.last_update,
        up = case when coalesce(datediff(second, lag(X.last_update) over (order by X.last_update), X.last_update), 0) <= 30 then 1 else 0 end,
        number = row_number() over (order by X.last_update)
    from
        @x X
)
select
    I.*,
    continues = case when lag(I.up) over (order by I.start_time) = I.up then 1 else 0 end
from
    Intervals I;

接下来我要输入的值可以放入group by子句中，该子句将正确汇总连续的时间间隔。上一个结果集中的每个记录，其中continues = 0代表新间隔的开始，应将其与continues = 1的所有紧随其后的记录分组。换句话说，我希望每个记录都有一个标识符，该标识符对于每个记录在continues = 0处增加，而在continues = 1处保持不变。我可以通过添加到您的初始查询中的number列并减去结果集中到目前为止遇到的连续数来实现：

with Intervals as
(
    select
        start_time = lag(X.last_update) over (order by X.last_update),
        end_time = X.last_update,
        up = case when coalesce(datediff(second, lag(X.last_update) over (order by X.last_update), X.last_update), 0) <= 30 then 1 else 0 end,
        number = row_number() over (order by X.last_update)
    from
        @x X
),
IntervalExtents as
(
    select
        I.*,
        continues = case when lag(I.up) over (order by I.start_time) = I.up then 1 else 0 end
    from
        Intervals I
)
select
    X.*,
    group_number = number - sum(continues) over (order by number)
from
    IntervalExtents X;

现在剩下的一切就是准确确定如何汇总此查询中标识的组。在这里，您的预期结果对我来说有点奇怪。您的样本数据集的时间戳记为15：06：10、15：06：40、15：07：10和15:07:40。当然，这意味着系统从15:06:10到15:07:40处于在线状态，间隔为90秒，因为在此期间每30秒就有一次条目。但是您的示例结果显示正常运行时间为15:06:40至15:07:40，间隔为60秒。我猜那是个错误；如果不是，您仍然可以通过修改我将要显示的查询来获得预期的结果，并确定正常运行时间与停机时间之间的间隔范围。这是我认为最终查询应采用的方式：

with Intervals as
(
    select
        start_time = lag(X.last_update) over (order by X.last_update),
        end_time = X.last_update,
        up = case when coalesce(datediff(second, lag(X.last_update) over (order by X.last_update), X.last_update), 0) <= 30 then 1 else 0 end,
        number = row_number() over (order by X.last_update)
    from
        @x X
),
IntervalExtents as
(
    select
        I.*,
        continues = case when lag(I.up) over (order by I.start_time) = I.up then 1 else 0 end
    from
        Intervals I
),
IntervalGroups as
(
    select
        X.*,
        group_number = number - sum(continues) over (order by number)
    from
        IntervalExtents X
)
select
    [status] = case when min(G.up) = 1 then 'Up' else 'Down' end,
    start_time = min(G.start_time),
    end_time = max(G.end_time),
    duration = datediff(second, min(G.start_time), max(G.end_time))
from
    IntervalGroups G
group by
    G.group_number
having
    min(G.start_time) is not null
order by
    G.group_number;

请注意我在这里使用的HAVING子句。它要求聚合的每个间隔组至少包含带有非空start_time的记录。这样可以防止初始15:02:41时间戳显示为正常运行时间的零秒间隔。我等到执行此检查步骤的原因是，如果我对您的任何要求有误解，并且您需要调整查询，那么您仍然可以使用原始数据。如果可以正常运行，则只需更改IntervalExtents的CTE，使其仅包含I.start_time is not null处的记录，这将消除最后一个HAVING上的SELECT子句。

应该的。希望这不能回答您的问题，至少可以帮助您入门。

计算日期差，对行进行补全并创建一次新的

1 个答案: