有人可以帮我完成下一个任务吗?
这是一个问题:我们有一个历史表(进程的状态更改),我们需要计算进程处于特定状态的时间(以小时/小时为单位)。这是历史表的结构:
ID| ProcessId| CreatedDate | Status
-------------------------------------------
1 | Process1 | 2016-01-09 06:30:00 | UP
2 | Process1 | 2016-01-09 12:30:00 | UP
3 | Process1 | 2016-01-09 18:30:00 | DOWN
4 | Process1 | 2016-01-10 00:30:00 | UP
5 | Process2 | 2016-01-08 18:30:00 | UP
6 | Process2 | 2016-01-09 00:30:00 | DOWN
7 | Process2 | 2016-01-09 06:30:00 | DOWN
8 | Process2 | 2016-01-09 12:30:00 | DOWN
9 | Process2 | 2016-01-09 18:30:00 | DOWN
10| Process2 | 2016-01-10 00:30:00 | UP
11| Process2 | 2016-01-10 06:30:00 | UP
12| Process2 | 2016-01-10 12:30:00 | DOWN
13| Process2 | 2016-01-10 18:30:00 | DOWN
14| Process2 | 2016-01-11 00:30:00 | DOWN
15| Process2 | 2016-01-11 06:30:00 | DOWN
因此,我们需要创建一个视图/表格,如:
ProcessId | Status | Date | TimeSpentInStatusInDays
----------------------------------------------------------
Process1 | UP | 2016-01-09 | 12h 00m
Process1 | DOWN | 2016-01-09 | 05h 30m
Process1 | UP | 2016-01-10 | 00h 00m
Process1 | DOWN | 2016-01-10 | 00h 30m
Process2 | UP | 2016-01-08 | 05h 30m
Process2 | DOWN | 2016-01-08 | 00h 00m
Process2 | UP | 2016-01-09 | 24h 00m
Process2 | DOWN | 2016-01-09 | 00h 00m
Process2 | UP | 2016-01-10 | 12h 00m
Process2 | DOWN | 2016-01-10 | 12h 00m
Process2 | UP | 2016-01-11 | 00h 00m
Process2 | DOWN | 2016-01-11 | 06h 30m
例如,值(它们未连接到实际数据集)。
代码需要在mySQL中。任何帮助都会非常感激。谢谢。
答案 0 :(得分:1)
我不承诺这是在MySQL中执行此操作的好方法,或者它很快。
我会记录您的历史记录表并在每天结束时附加行(每个进程的最后一天除外)。添加的行包含每天每个进程的最后一行的状态。如果这样的行已经存在,这确实可能导致午夜的瞬时状态变化。 (我后来尝试处理这个场景。)
由于MySQL没有超前/滞后功能,我将上述两个相同副本的每一行匹配,以便按顺序查找下一次(可能是为结束时添加的逻辑状态行)那天。)之后只是分组问题。
由于我不熟悉MySQL日期函数,因此我只使用time_to_sec
,因为跨度永远不会超过一天。唯一的复杂因素是必须特别对待午夜。我将让您处理将秒值转换为适当的输出格式。
http://sqlfiddle.com/#!9/b0f3279/44
select
ProcessId,
date(CreatedDate) as `Date`,
Status,
sum(
case
when time_to_sec(NextDate) = 0 then 86400
else time_to_sec(NextDate)
end - time_to_sec(CreatedDate)
) as TimeSpentSeconds
from
(
select
h1.ProcessId, h1.CreatedDate, h1.Status,
min(
h2.CreatedDate
--case
-- when date(h2.CreatedDate) > date(h1.CreatedDate)
-- then date_add(date(h1.CreatedDate), interval 1 day)
-- else h2.CreatedDate
--end
) as NextDate
from
(
select ProcessId, CreatedDate, Status from history
union
select
ProcessId,
date_add(date(CreatedDate), interval 1 day),
substring(
max(
concat(
date_format(CreatedDate, get_format(datetime, 'ISO')),
Status
)
), 20, 10) as LastStatus
from history h0
where date(CreatedDate) <
(
select max(date(CreatedDate)) from history hm
where hm.ProcessId = h0.ProcessId
)
group by ProcessId, date(CreatedDate)
) h1
inner join
(
select ProcessId, CreatedDate, Status from history
union
select
ProcessId,
date_add(date(CreatedDate), interval 1 day),
substring(
max(
concat(
date_format(CreatedDate, get_format(datetime, 'ISO')),
Status
)
), 20, 10) as LastStatus
from history h0
where date(CreatedDate) <
(
select max(date(CreatedDate)) from history hm
where hm.ProcessId = h0.ProcessId
)
group by ProcessId, date(CreatedDate)
) h2
on h2.ProcessId = h1.ProcessId
and h1.CreatedDate < h2.CreatedDate
and h2.CreatedDate <= date_add(date(h1.CreatedDate), interval 1 day)
group by h1.ProcessId, h1.CreatedDate, h1.Status
) hx
group by ProcessId, date(CreatedDate), Status
order by ProcessId, `Date`, Status desc, TimeSpentSeconds
我相信第二个选项可以处理上面提到的瞬时/重复状态。它已经有点复杂但感觉更麻烦了。我添加了一种序列号,以便于打破平局并调整时差表达式。最后,我添加了一个having
子句来消除吐出的零累积行。请参阅小提琴样本数据中的ProcessX:
select
ProcessId,
date(CreatedDate) as `Date`,
Status,
sum(
case
when NextDate > CreatedDate and time_to_sec(NextDate) = 0 then 86400
else time_to_sec(NextDate)
end - time_to_sec(CreatedDate)
) as TimeSpentSeconds
from
(
select
h1.ProcessId, h1.CreatedDate, h1.Status,
min(
h2.CreatedDate,
--case
-- when date(h2.CreatedDate) > date(h1.CreatedDate)
-- then date_add(date(h1.CreatedDate), interval 1 day)
-- else h2.CreatedDate
--end
) as NextDate
from
(
select 1 as Sequence, ProcessId, CreatedDate, Status from history
union all
select
0,
ProcessId,
date_add(date(CreatedDate), interval 1 day),
substring(
max(
concat(
date_format(CreatedDate, get_format(datetime, 'ISO')),
Status
)
), 20, 10) as LastStatus
from history h0
where date(CreatedDate) <
(
select max(date(CreatedDate)) from history hm
where hm.ProcessId = h0.ProcessId
)
group by ProcessId, date(CreatedDate)
) h1
inner join
(
select 1 as Sequence, ProcessId, CreatedDate, Status from history
union all
select
0,
ProcessId,
date_add(date(CreatedDate), interval 1 day),
substring(
max(
concat(
date_format(CreatedDate, get_format(datetime, 'ISO')),
Status
)
), 20, 10) as LastStatus
from history h0
where date(CreatedDate) <
(
select max(date(CreatedDate)) from history hm
where hm.ProcessId = h0.ProcessId
)
group by ProcessId, date(CreatedDate)
) h2
on h2.ProcessId = h1.ProcessId
and (
h1.CreatedDate < h2.CreatedDate
and h2.CreatedDate <= date_add(date(h1.CreatedDate), interval 1 day)
or
h1.CreatedDate = h2.CreatedDate
and h1.Sequence < h2.Sequence
)
group by h1.ProcessId, h1.CreatedDate, h1.Status
) hx
group by ProcessId, date(CreatedDate), Status
having TimeSpentSeconds > 0 /* MySQL shortcut reference */
order by ProcessId, `Date`, Status desc, TimeSpentSeconds
http://sqlfiddle.com/#!9/b582b2/10
我只是意识到NextDate
的表达式不需要检查午夜超限,所以我评论了这一点。虽然我没有改变小提琴。而且我也忘了提到我假设每个流程每天至少有一个状态报告。也许这是与其他MySQL选项一起使用的起点,如临时表(速度表)或变量(表示超前/滞后)。
答案 1 :(得分:1)
我喜欢你的问题,因为它给了我一个理解SQL的理由,我暂时没有机会这样做。
以下是我对你问题的看法。
首先,我们准备一个临时表TempStatusLog
,每天我们在00:00:01添加记录,状态等于当天最早的记录,并在23:59记录: 59与当天的最新阅读。我们还使用变量@rownumvar
对所有行进行编号。假设原始表名为StatusLog
,则使用此SELECT
语句创建临时表:
SELECT @rownumvar := @rownumvar + 1 AS `rowNo`,
`t`.`ProcessId`, `t`.`CreatedDate`, `t`.`Status`
FROM (SELECT `ProcessId`, `CreatedDate`, `Status`
FROM `StatusLog`
UNION
SELECT `ProcessId`,
STR_TO_DATE(CONCAT(`OnDate`, ' 23:59:59'),
'%Y-%m-%d %H:%i:%s') AS `CreatedDate`,
(SELECT `Status`
FROM `StatusLog` AS `l`
WHERE `l`.`ProcessId` = `t1`.`ProcessId` AND
`l`.`CreatedDate`
= STR_TO_DATE(CONCAT(`t1`.`OnDate`, ' ', `t1`.`LastStatus`),
'%Y-%m-%d %H:%i:%s')) AS `Status`
FROM (SELECT `ProcessId`,
DATE_FORMAT(`CreatedDate`, '%Y-%m-%d') AS `OnDate`,
DATE_FORMAT(MAX(TIME(`CreatedDate`)), '%H:%i:%s') AS `LastStatus`
FROM `StatusLog`
GROUP BY DATE(`OnDate`), `ProcessId`
ORDER BY `ProcessId`, DATE(`OnDate`)) AS `t1`
UNION
SELECT `ProcessId`,
STR_TO_DATE(CONCAT(`OnDate`, ' 00:00:01'),
'%Y-%m-%d %H:%i:%s') AS `CreatedDate`,
(SELECT `Status`
FROM `StatusLog` AS `l`
WHERE `l`.`ProcessId` = `t2`.`ProcessId` AND
`l`.`CreatedDate`
= STR_TO_DATE(CONCAT(`t2`.`OnDate`, ' ', `t2`.`FirstStatus`),
'%Y-%m-%d %H:%i:%s')) AS `Status`
FROM (SELECT `ProcessId`,
DATE_FORMAT(`CreatedDate`, '%Y-%m-%d') AS `OnDate`,
DATE_FORMAT(MIN(TIME(`CreatedDate`)), '%H:%i:%s') AS `FirstStatus`
FROM `StatusLog`
GROUP BY DATE(`OnDate`), `ProcessId`
ORDER BY `ProcessId`, DATE(`OnDate`)) AS `t2`) AS `t`,
(SELECT @rownumvar := 0) AS `r`
ORDER BY `t`.`ProcessId`, `t`.`CreatedDate` ASC
现在,每天计算每个流程在每个州的持续时间相对容易。我们选择一个两行的运行窗口(这是编号的行进入的位置)并计算每两个读数之间的时间差,然后将它们总结起来:
SELECT `p`.`ProcessId`,
DATE_FORMAT(`q`.`CreatedDate`, '%Y-%m-%d') AS `Day`,
DATE_FORMAT(
SEC_TO_TIME(
SUM(
TIME_TO_SEC(
TIMEDIFF(TIME(`q`.`CreatedDate`),
TIME(`p`.`CreatedDate`))
)
)
),
'%H:%i:%s'
) AS `Elapsed`,
`p`.`Status`
FROM `TempStatusLog` AS `p`,
`TempStatusLog` AS `q`
WHERE `q`.`rowNo` = `p`.`rowNo` + 1 AND
DATE(`q`.`CreatedDate`) = DATE(`p`.`CreatedDate`)
GROUP BY `Day`, `Status`, `ProcessId`
ORDER BY `Day` ASC, `ProcessId` ASC, `Status` ASC
此解决方案存在两个小问题:
对我而言,这两个问题似乎都太小而无法理解。
在这里,您可以看一下现场演示:http://www.sqlfiddle.com/#!9/0a79cc/1
请注意,SQLFiddle不允许创建临时表,因此我为此创建了一个普通表。
PS:在MySQL中解决这个问题比在几乎任何其他RDBMS中解决这个问题要困难得多,因为MySQL不支持SQL的许多功能。首先,它不支持CTE,它是ANSI SQL规范的一部分。这会强制用户创建临时表或查找其他类似的解决方法。许多RDBMS(Oracle,SQL Server)也支持ROW_NUMBER()
函数的一些变体,我必须使用变量来处理它。
答案 2 :(得分:0)
只是为了好玩。去Postgres:)
select
ProcessId, CreatedDate, Status,
to_char( CreatedDate - lag( CreatedDate ) over ( order by CreatedDate, ProcessId ), 'HH24:MI' ) as diff
from history
order by ProcessId, ID;