在有序数据集中指定连续相等值的组

时间:2018-12-06 23:06:31

标签: postgresql aggregate-functions window-functions gaps-and-islands

我正在尝试使输出看起来像下面。问题是我无法执行first_value或RANK,因为当我按事件和时间顺序进行分区时,则不会按该顺序将它们分解。我需要他们先按时间排序,然后每次进行分区。

enter image description here

1 个答案:

答案 0 :(得分:0)

一种已知的解决方案

lag()更改时,使用event标记行,累积sum()来指定组,例如:

with my_table(event, time) as (
values 
    ('A', '12:01'),
    ('A', '12:02'),
    ('B', '12:03'),
    ('A', '12:04'),
    ('A', '12:05'),
    ('B', '12:06'),
    ('B', '12:07'),
    ('A', '12:08')
)

select 
    event, 
    time, 
    sum(change) over (order by time) as "desired row number"
from (
    select 
        event, 
        time, 
        (event is distinct from lag(event) over (order by time))::int as change
    from my_table
    ) s

 event | time  | desired row number 
-------+-------+--------------------
 A     | 12:01 |                  1
 A     | 12:02 |                  1
 B     | 12:03 |                  2
 A     | 12:04 |                  3
 A     | 12:05 |                  3
 B     | 12:06 |                  4
 B     | 12:07 |                  4
 A     | 12:08 |                  5
(8 rows)

自定义聚合

最好具有以下功能:

select *, group_number(event) over (order by time)
from my_table;

这可以通过自定义聚合来完成:

create type group_number_internal as (number int, lag text);

create or replace function group_number_transition(group_number_internal, anyelement)
returns group_number_internal language sql strict as $$
    select 
        case 
            when $2::text is distinct from $1.lag then $1.number+ 1 
            else $1.number 
        end, 
        $2::text
$$;

create or replace function group_number_final(group_number_internal)
returns int language sql as $$
    select $1.number
$$;

create aggregate group_number(anyelement) (
    sfunc = group_number_transition,
    stype = group_number_internal,
    finalfunc = group_number_final,
    initcond = '(0, null)'
);

Test it in rextester.