计算记录每天在特定状态下花费的时间

时间:2016-02-09 23:05:49

标签: mysql sql

有人可以帮我完成下一个任务吗?

这是一个问题:我们有一个历史表(进程的状态更改),我们需要计算进程处于特定状态的时间(以小时/小时为单位)。这是历史表的结构:

ID| ProcessId| CreatedDate         | Status 
------------------------------------------- 
1 | Process1 | 2016-01-09 06:30:00 | UP
2 | Process1 | 2016-01-09 12:30:00 | UP
3 | Process1 | 2016-01-09 18:30:00 | DOWN
4 | Process1 | 2016-01-10 00:30:00 | UP 
5 | Process2 | 2016-01-08 18:30:00 | UP
6 | Process2 | 2016-01-09 00:30:00 | DOWN
7 | Process2 | 2016-01-09 06:30:00 | DOWN
8 | Process2 | 2016-01-09 12:30:00 | DOWN
9 | Process2 | 2016-01-09 18:30:00 | DOWN
10| Process2 | 2016-01-10 00:30:00 | UP
11| Process2 | 2016-01-10 06:30:00 | UP
12| Process2 | 2016-01-10 12:30:00 | DOWN
13| Process2 | 2016-01-10 18:30:00 | DOWN
14| Process2 | 2016-01-11 00:30:00 | DOWN
15| Process2 | 2016-01-11 06:30:00 | DOWN

因此,我们需要创建一个视图/表格,如:

ProcessId | Status | Date        | TimeSpentInStatusInDays
----------------------------------------------------------
Process1  | UP     | 2016-01-09  | 12h 00m
Process1  | DOWN   | 2016-01-09  | 05h 30m
Process1  | UP     | 2016-01-10  | 00h 00m
Process1  | DOWN   | 2016-01-10  | 00h 30m
Process2  | UP     | 2016-01-08  | 05h 30m
Process2  | DOWN   | 2016-01-08  | 00h 00m
Process2  | UP     | 2016-01-09  | 24h 00m
Process2  | DOWN   | 2016-01-09  | 00h 00m
Process2  | UP     | 2016-01-10  | 12h 00m
Process2  | DOWN   | 2016-01-10  | 12h 00m
Process2  | UP     | 2016-01-11  | 00h 00m
Process2  | DOWN   | 2016-01-11  | 06h 30m

例如,值(它们未连接到实际数据集)。

代码需要在mySQL中。任何帮助都会非常感激。谢谢。

3 个答案:

答案 0 :(得分:1)

我不承诺这是在MySQL中执行此操作的好方法,或者它很快。

我会记录您的历史记录表并在每天结束时附加行(每个进程的最后一天除外)。添加的行包含每天每个进程的最后一行的状态。如果这样的行已经存在,这确实可能导致午夜的瞬时状态变化。 (我后来尝试处理这个场景。)

由于MySQL没有超前/滞后功能,我将上述两个相同副本的每一行匹配,以便按顺序查找下一次(可能是为结束时添加的逻辑状态行)那天。)之后只是分组问题。

由于我不熟悉MySQL日期函数,因此我只使用time_to_sec,因为跨度永远不会超过一天。唯一的复杂因素是必须特别对待午夜。我将让您处理将秒值转换为适当的输出格式。

http://sqlfiddle.com/#!9/b0f3279/44

select
    ProcessId,
    date(CreatedDate) as `Date`,
    Status,
    sum(
        case
            when time_to_sec(NextDate) = 0 then 86400
            else time_to_sec(NextDate)
        end - time_to_sec(CreatedDate)
    ) as TimeSpentSeconds
from
    (
    select
        h1.ProcessId, h1.CreatedDate, h1.Status,
        min(
            h2.CreatedDate
            --case
            --    when date(h2.CreatedDate) > date(h1.CreatedDate)
            --    then date_add(date(h1.CreatedDate), interval 1 day)
            --    else h2.CreatedDate
            --end 
        ) as NextDate
    from
        (
        select ProcessId, CreatedDate, Status from history
        union
        select
            ProcessId,
            date_add(date(CreatedDate), interval 1 day),
            substring(
                max(
                    concat(
                        date_format(CreatedDate, get_format(datetime, 'ISO')),
                        Status
                    )
                ), 20, 10) as LastStatus
        from history h0
        where date(CreatedDate) <
            (
                select max(date(CreatedDate)) from history hm
                where hm.ProcessId = h0.ProcessId
            )
        group by ProcessId, date(CreatedDate)
        ) h1
            inner join
        (
        select ProcessId, CreatedDate, Status from history
        union
        select
            ProcessId,
            date_add(date(CreatedDate), interval 1 day),
            substring(
                max(
                    concat(
                        date_format(CreatedDate, get_format(datetime, 'ISO')),
                        Status
                    )
                ), 20, 10) as LastStatus
        from history h0
        where date(CreatedDate) <
            (
                select max(date(CreatedDate)) from history hm
                where hm.ProcessId = h0.ProcessId
            )
        group by ProcessId, date(CreatedDate)
        ) h2
            on      h2.ProcessId   = h1.ProcessId
                and h1.CreatedDate < h2.CreatedDate
                and h2.CreatedDate <= date_add(date(h1.CreatedDate), interval 1 day)
    group by h1.ProcessId, h1.CreatedDate, h1.Status
    ) hx
group by ProcessId, date(CreatedDate), Status
order by ProcessId, `Date`, Status desc, TimeSpentSeconds

我相信第二个选项可以处理上面提到的瞬时/重复状态。它已经有点复杂但感觉更麻烦了。我添加了一种序列号,以便于打破平局并调整时差表达式。最后,我添加了一个having子句来消除吐出的零累积行。请参阅小提琴样本数据中的ProcessX:

select
    ProcessId,
    date(CreatedDate) as `Date`,
    Status,
    sum(
        case
            when NextDate > CreatedDate and time_to_sec(NextDate) = 0 then 86400
            else time_to_sec(NextDate)
        end - time_to_sec(CreatedDate)
    ) as TimeSpentSeconds
from
    (
    select
        h1.ProcessId, h1.CreatedDate, h1.Status,
        min(
            h2.CreatedDate,
            --case
            --    when date(h2.CreatedDate) > date(h1.CreatedDate)
            --    then date_add(date(h1.CreatedDate), interval 1 day)
            --    else h2.CreatedDate
            --end 
        ) as NextDate
    from
        (
        select 1 as Sequence, ProcessId, CreatedDate, Status from history
        union all
        select
            0,
            ProcessId,
            date_add(date(CreatedDate), interval 1 day),
            substring(
                max(
                    concat(
                        date_format(CreatedDate, get_format(datetime, 'ISO')),
                        Status
                    )
                ), 20, 10) as LastStatus
        from history h0
        where date(CreatedDate) <
            (
                select max(date(CreatedDate)) from history hm
                where hm.ProcessId = h0.ProcessId
            )
        group by ProcessId, date(CreatedDate)
        ) h1
            inner join
        (
        select 1 as Sequence, ProcessId, CreatedDate, Status from history
        union all
        select
            0,
            ProcessId,
            date_add(date(CreatedDate), interval 1 day),
            substring(
                max(
                    concat(
                        date_format(CreatedDate, get_format(datetime, 'ISO')),
                        Status
                    )
                ), 20, 10) as LastStatus
        from history h0
        where date(CreatedDate) <
            (
                select max(date(CreatedDate)) from history hm
                where hm.ProcessId = h0.ProcessId
            )
        group by ProcessId, date(CreatedDate)
        ) h2
            on      h2.ProcessId   = h1.ProcessId
                and (
                        h1.CreatedDate <  h2.CreatedDate
                    and h2.CreatedDate <= date_add(date(h1.CreatedDate), interval 1 day)
                    or
                        h1.CreatedDate =  h2.CreatedDate
                    and h1.Sequence    <  h2.Sequence
                )   
    group by h1.ProcessId, h1.CreatedDate, h1.Status
    ) hx
group by ProcessId, date(CreatedDate), Status
having TimeSpentSeconds > 0 /* MySQL shortcut reference */
order by ProcessId, `Date`, Status desc, TimeSpentSeconds

http://sqlfiddle.com/#!9/b582b2/10

我只是意识到NextDate的表达式不需要检查午夜超限,所以我评论了这一点。虽然我没有改变小提琴。而且我也忘了提到我假设每个流程每天至少有一个状态报告。也许这是与其他MySQL选项一起使用的起点,如临时表(速度表)或变量(表示超前/滞后)。

答案 1 :(得分:1)

我喜欢你的问题,因为它给了我一个理解SQL的理由,我暂时没有机会这样做。

以下是我对你问题的看法。

首先,我们准备一个临时表TempStatusLog,每天我们在00:00:01添加记录,状态等于当天最早的记录,并在23:59记录: 59与当天的最新阅读。我们还使用变量@rownumvar对所有行进行编号。假设原始表名为StatusLog,则使用此SELECT语句创建临时表:

SELECT @rownumvar := @rownumvar + 1 AS `rowNo`,
       `t`.`ProcessId`, `t`.`CreatedDate`, `t`.`Status`
FROM (SELECT `ProcessId`, `CreatedDate`, `Status`
      FROM   `StatusLog`

      UNION

      SELECT `ProcessId`,
             STR_TO_DATE(CONCAT(`OnDate`, ' 23:59:59'),
                         '%Y-%m-%d %H:%i:%s') AS `CreatedDate`,
            (SELECT `Status`
             FROM   `StatusLog` AS `l`
             WHERE  `l`.`ProcessId` = `t1`.`ProcessId` AND
                    `l`.`CreatedDate`
                      = STR_TO_DATE(CONCAT(`t1`.`OnDate`, ' ', `t1`.`LastStatus`),
                                    '%Y-%m-%d %H:%i:%s')) AS `Status`
      FROM (SELECT `ProcessId`,
                   DATE_FORMAT(`CreatedDate`, '%Y-%m-%d') AS `OnDate`,
                   DATE_FORMAT(MAX(TIME(`CreatedDate`)), '%H:%i:%s') AS `LastStatus`
            FROM   `StatusLog`
            GROUP BY DATE(`OnDate`), `ProcessId`
            ORDER BY `ProcessId`, DATE(`OnDate`)) AS `t1`

      UNION

      SELECT `ProcessId`,
             STR_TO_DATE(CONCAT(`OnDate`, ' 00:00:01'),
                         '%Y-%m-%d %H:%i:%s') AS `CreatedDate`,
            (SELECT `Status`
             FROM   `StatusLog` AS `l`
             WHERE  `l`.`ProcessId` = `t2`.`ProcessId` AND
                    `l`.`CreatedDate`
                      = STR_TO_DATE(CONCAT(`t2`.`OnDate`, ' ', `t2`.`FirstStatus`),
                                    '%Y-%m-%d %H:%i:%s')) AS `Status`
      FROM (SELECT `ProcessId`,
                   DATE_FORMAT(`CreatedDate`, '%Y-%m-%d') AS `OnDate`,
                   DATE_FORMAT(MIN(TIME(`CreatedDate`)), '%H:%i:%s') AS `FirstStatus`
            FROM   `StatusLog`
            GROUP BY DATE(`OnDate`), `ProcessId`
            ORDER BY `ProcessId`, DATE(`OnDate`)) AS `t2`) AS `t`,
     (SELECT @rownumvar := 0) AS `r`
ORDER BY `t`.`ProcessId`, `t`.`CreatedDate` ASC

现在,每天计算每个流程在每个州的持续时间相对容易。我们选择一个两行的运行窗口(这是编号的行进入的位置)并计算每两个读数之间的时间差,然后将它们总结起来:

SELECT `p`.`ProcessId`,
       DATE_FORMAT(`q`.`CreatedDate`, '%Y-%m-%d') AS `Day`,
       DATE_FORMAT(
         SEC_TO_TIME(
           SUM(
             TIME_TO_SEC(
               TIMEDIFF(TIME(`q`.`CreatedDate`),
                        TIME(`p`.`CreatedDate`))
             )
           )
         ),
         '%H:%i:%s'
       ) AS `Elapsed`,
       `p`.`Status`
FROM   `TempStatusLog` AS `p`,
       `TempStatusLog` AS `q`
WHERE  `q`.`rowNo` = `p`.`rowNo` + 1 AND
       DATE(`q`.`CreatedDate`) = DATE(`p`.`CreatedDate`)
GROUP BY `Day`, `Status`, `ProcessId`
ORDER BY `Day` ASC, `ProcessId` ASC, `Status` ASC

此解决方案存在两个小问题:

  1. 每天失去2秒。也就是说,如果一个过程一整天都在进行,它会说它已经到了23:59:58。
  2. 如果这个过程一整天都没有,那么就没有记录表明它已经停止了00:00:00(反之亦然)
  3. 对我而言,这两个问题似乎都太小而无法理解。

    在这里,您可以看一下现场演示:http://www.sqlfiddle.com/#!9/0a79cc/1

    请注意,SQLFiddle不允许创建临时表,因此我为此创建了一个普通表。

    PS:在MySQL中解决这个问题比在几乎任何其他RDBMS中解决这个问题要困难得多,因为MySQL不支持SQL的许多功能。首先,它不支持CTE,它是ANSI SQL规范的一部分。这会强制用户创建临时表或查找其他类似的解决方法。许多RDBMS(Oracle,SQL Server)也支持ROW_NUMBER()函数的一些变体,我必须使用变量来处理它。

答案 2 :(得分:0)

只是为了好玩。去Postgres:)

select 
  ProcessId, CreatedDate, Status, 
  to_char( CreatedDate - lag( CreatedDate ) over ( order by CreatedDate, ProcessId ), 'HH24:MI' ) as diff
from history
order by ProcessId, ID;

http://sqlfiddle.com/#!15/83cb0/9