我有一个如下所示的对象状态日志
timestamp, object_id, state, level
2018-01-01 123 f 100
2018-01-02 123 t 100
2018-01-02 123 f 100
2018-01-03 123 f 100
2018-01-03 123 f 100
2018-01-06 123 t 90
2018-01-07 123 t 90
2018-01-08 123 f 90
时间戳实际上是一个完整的日期/时间,为简洁起见,我没有包括时间部分。
我要获取的是基于唯一状态和级别的状态转换列表,看起来像这样
start end object_id, state, level
2018-01-01 2018-01-02 123 f 100
2018-01-02 2018-01-02 123 t 100
2018-01-02 2018-01-06 123 f 100
2018-01-06 2018-01-08 123 t 90
2018-01-08 NOW() 123 f 90
我试图想出一种使用窗口函数来完成此操作的方法,例如
SELECT
timestamp,
object_id,
timestamp as start,
lead(timestamp) OVER (ORDER BY timestamp) as end,
FROM (
SELECT
timestamp,
object_id,
state,
evel,
rank() OVER (PARTITION BY (state, level) ORDER BY timestamp) as rank
FROM state_log AS l
WHERE object_id=123 AND timestamp >= DATE '2018-01-01'
ORDER BY timestamp
) AS states
WHERE rank=1
但是我想我不明白rank()是如何工作的,它不能满足我的需要。出于某种原因,我认为rank()会在分区每次更改时重置行计数,但事实并非如此。我该怎么做?
答案 0 :(得分:0)
这是一个孤岛问题。一个不错的解决方案使用row_number()
:
select object_id, level, state, min(timestamp), max(timestamp)
from (select t.*,
row_number() over (partition by object_id, level order by timestamp) as seqnum,
row_number() over (partition by object_id, level, state order by timestamp) as seqnum_2
from t
) t
group by (seqnum - seqnum_2), object_id, level, state;
很难解释为什么这样做。但是,如果查看子查询的结果,则将发现当状态为常数时,两个seqnum
之间的差异是恒定的。这就定义了您想要的分组-以及其他列-其余的只是聚合。
这是rextester,显示它正常工作。
答案 1 :(得分:0)
这与“差距和孤岛”无关。该技术通过具有某个字段的相应常量值的组进行操作,但是您需要使用此类组的边界进行操作。所以:
create table state_log(timestamp timestamp, object_id int, state boolean, level int);
insert into state_log values
('2018-01-01 00:00:01', 123, 'f', 100),
('2018-01-02 00:00:02', 123, 't', 100),
('2018-01-02 00:00:03', 123, 'f', 100),
('2018-01-03 00:00:04', 123, 'f', 100),
('2018-01-03 00:00:05', 123, 'f', 100),
('2018-01-06 00:00:06', 123, 't', 90),
('2018-01-07 00:00:07', 123, 't', 90),
('2018-01-08 00:00:08', 123, 'f', 90);
select
timestamp::date as start,
coalesce(lead(timestamp) over (order by timestamp), now()::timestamp)::date as end,
object_id, state, level
from (
select
*,
coalesce(lag(state) over (order by timestamp) <> state, true) as is_new_group
from state_log) as t
where
object_id = 123 and timestamp >= date '2018-01-01' and
is_new_group
order by timestamp;
结果(我删除了时间部分,使其更像问题中指定的结果):
┌────────────┬────────────┬───────────┬───────┬───────┐ │ start │ end │ object_id │ state │ level │ ├────────────┼────────────┼───────────┼───────┼───────┤ │ 2018-01-01 │ 2018-01-02 │ 123 │ f │ 100 │ │ 2018-01-02 │ 2018-01-02 │ 123 │ t │ 100 │ │ 2018-01-02 │ 2018-01-06 │ 123 │ f │ 100 │ │ 2018-01-06 │ 2018-01-08 │ 123 │ t │ 90 │ │ 2018-01-08 │ 2018-08-30 │ 123 │ f │ 90 │ └────────────┴────────────┴───────────┴───────┴───────┘