我有一个带有项目和时间戳的事件表。我想查询所有连续项目系列。如果一个项目连续发生超过一次,则该项目应列出几次。我还想获得每个系列的开始和结束时间以及持续时间。
示例:
| project | created_at |
|-----------|-------------------------|
| project a | 2020-05-29 10:00:00.000 |
| project a | 2020-05-29 10:00:01.167 |
| project a | 2020-05-29 10:00:03.954 |
| project a | 2020-05-29 10:00:10.055 |
| project b | 2020-05-29 10:05:00.000 |
| project b | 2020-05-29 10:06:01.049 |
| project b | 2020-05-29 10:06:30.197 |
| project a | 2020-05-29 10:07:05.167 |
| project a | 2020-05-29 10:07:18.680 |
我想收到以下输出:
| project | start | end | duration |
|-----------|-------------------------|-------------------------|--------------|
| project a | 2020-05-29 10:00:00.000 | 2020-05-29 10:00:10.055 | 00:00:10.055 |
| project b | 2020-05-29 10:05:00.000 | 2020-05-29 10:06:30.197 | 00:01:30:197 |
| project a | 2020-05-29 10:07:05.167 | 2020-05-29 10:07:18.680 | 00:00:13.513 |
到目前为止,我有以下查询:
SELECT
project,
created_at AS "Start",
Max(created_at) AS "End",
TIMEDIFF(MAX(created_at), created_at) AS "Duration"
FROM results GROUP BY project;
这给了我以下输出:
| project | start | end | duration |
|-----------|-------------------------|-------------------------|--------------|
| project a | 2020-05-29 10:00:00.000 | 2020-05-29 10:07:18.680 | 00:07:18.680 |
| project b | 2020-05-29 10:05:00.000 | 2020-05-29 10:06:30.197 | 00:01:30:197 |
问题是我只能通过group by获得两个输出。这反过来会弄乱要输出的开始日期和结束日期以及持续时间。
是否可以解决此问题,以便获得所需的输出?
答案 0 :(得分:1)
这是一个空白与孤岛问题的示例。行号的不同应满足您的要求:
SELECT project, MIN(created_at) as start_dt, max(created_at) as end_dt
TIMEDIFF(MAX(created_at), created_at) AS Duration
FROM (SELECT r.*,
ROW_NUMBER() OVER (PARTITION BY project ORDER BY created_at) as seqnum_p,
ROW_NUMBER() OVER (ORDER BY created_at) as seqnum
FROM results r
) r
GROUP BY project, (seqnum - seqnum_p)
ORDER BY MIN(created_at);