我有一个记录应用程序的数据集。它记录时间以及我的小部件是否正常:
CREATE TABLE runs (time int, ok int);
INSERT INTO runs VALUES
(1, NULL),
(2, 1),
(3, 1),
(4, 1),
(5, NULL),
(6, NULL),
(7, 1),
(8, 1),
(9, NULL),
(10, 1)
我想使用窗口函数(我认为)来确定这些“ok”-ness运行的长度。所以最终数据集应如下所示:
time | ok_length
----------------
2 | 3
7 | 2
10 | 1
据我所知:
SELECT
time,
ok,
CASE WHEN
LAG(ok) OVER (ORDER BY time) IS NOT null
THEN SUM(ok) OVER (ORDER BY time) END
FROM runs
ORDER BY time
但它完全错了。有人可以帮忙吗?也许我必须在窗口函数的末尾用框架做一些事情,但是当它达到NULL时,该框架必须有条件停止。 这是我正在使用的SQL小提琴:http://sqlfiddle.com/#!17/98bf4/3
答案 0 :(得分:1)
我认为有一些方法可以简化这一点,但基于值查询的这些类型的计数总是有点冗长。主要内容是:
group_start_cte
- 滞后以标记作为不同逻辑分组的开头的行。 group_cte
- 累计总和,为所有行提供组ID。group_cnt
- 按逻辑分组ID计算分区。first_time_for_group
- 获取小组开头的时间。最后我们将group_cnt
和first_time_for_group
放在一起:
WITH
group_start_cte AS (
SELECT
TIME,
ok,
CASE
WHEN LAG(ok) OVER (ORDER BY TIME asc) is distinct from ok
THEN TRUE
END AS group_start
FROM
runs
),
group_cte AS (
SELECT
TIME,
ok,
group_start,
SUM(CASE WHEN group_start THEN 1 ELSE 0 END) OVER (ORDER BY TIME asc) AS grp_id
FROM
group_start_cte
),
first_time_for_group as (
SELECT
time,
grp_id
FROM
group_cte
WHERE
group_start IS TRUE
),
group_cnt AS (
SELECT
grp_id,
count(*) AS ok_length
FROM
group_cte
WHERE
ok IS NOT NULL
GROUP BY
grp_id
)
SELECT
TIME,
ok_length
FROM
group_cnt
LEFT JOIN first_time_for_group
USING (grp_id)
ORDER BY
time ASC
;
答案 1 :(得分:0)
这里有一个不那么详细的解决方案:
select distinct
min(time) over (partition by gp)
, sum(ok) over (partition by gp)
from (
select *
, time - row_number() over (partition by ok order by time asc) gp
from runs
where ok is not null
) rs
order by 1