Question

给定一个连续运行数据的表：一个在任务正在进行时总是增加的数字，并在下一个任务开始时重置为零，如何选择每个数据运行的最大值？

每次连续运行可以有任意数量的行，并且数据运行由a＆＃34; start＆＃34;标记。和＆＃34;结束＆＃34;行，例如数据可能看起来像

user_id, action, qty, datetime
1,       start,  0,   2017-01-01 00:00:01
1,       record, 0,   2017-01-01 00:00:01
1,       record, 4,   2017-01-01 00:00:02
1,       record, 5,   2017-01-01 00:00:03
1,       record, 6,   2017-01-01 00:00:04
1,       end,    0,   2017-01-01 00:00:04
1,       start,  0,   2017-01-01 00:00:05
1,       record, 0,   2017-01-01 00:00:05
1,       record, 2,   2017-01-01 00:00:06
1,       record, 3,   2017-01-01 00:00:07
1,       end,    0,   2017-01-01 00:00:07
2,       start,  0,   2017-01-01 00:00:08
2,       record, 0,   2017-01-01 00:00:08
2,       record, 3,   2017-01-01 00:00:09
2,       record, 8,   2017-01-01 00:00:10
2,       end,    0,   2017-01-01 00:00:10

结果将是每次运行的最大值：

user_id, action, qty, datetime
1,       record, 6,   2017-01-01 00:00:04
1,       record, 3,   2017-01-01 00:00:07
2,       record, 8,   2017-01-01 00:00:10

使用任何postgres sql语法（9.3）？它是某种分组，然后从每个组中选择最大值，但我不知道如何进行分组。

Answer 1

如果单个用户没有重叠，而下一次运行总是在以后开始，那么您可以使用LAG()窗口功能。

with the_table(user_id, action, qty, datetime) as (
    select 1,'start',  0,   '2017-01-01 00:00:01'::timestamp union all
    select 1,'record', 0,   '2017-01-01 00:00:01'::timestamp union all
    select 1,'record', 4,   '2017-01-01 00:00:02'::timestamp union all
    select 1,'record', 5,   '2017-01-01 00:00:03'::timestamp union all
    select 1,'record', 6,   '2017-01-01 00:00:04'::timestamp union all
    select 1,'end',    0,   '2017-01-01 00:00:04'::timestamp union all
    select 1,'start',  0,   '2017-01-01 00:00:05'::timestamp union all
    select 1,'record', 0,   '2017-01-01 00:00:05'::timestamp union all
    select 1,'record', 2,   '2017-01-01 00:00:06'::timestamp union all
    select 1,'record', 3,   '2017-01-01 00:00:07'::timestamp union all
    select 1,'end',    0,   '2017-01-01 00:00:07'::timestamp union all
    select 2,'start',  0,   '2017-01-01 00:00:08'::timestamp union all
    select 2,'record', 0,   '2017-01-01 00:00:08'::timestamp union all
    select 2,'record', 3,   '2017-01-01 00:00:09'::timestamp union all
    select 2,'record', 8,   '2017-01-01 00:00:10'::timestamp union all
    select 2,'end',    0,   '2017-01-01 00:00:10'::timestamp  
)

select n_user_id, n_action, n_qty, n_datetime from (
    select action, 
    lag(user_id) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_user_id,
    lag(action) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_action,
    lag(qty) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_qty,
    lag(datetime) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_datetime 
    from the_table  
)t
where action = 'end'

由于某些action = record行的日期时间与start和end行相同，我在CASE中使用ORDER BY，以明确start首先是record，然后是end。

Answer 2

快速而肮脏，假设跑步不重叠

with bounds as (select starts.rn, starts.datetime as s, ends.datetime as e from
(select datetime,ROW_NUMBER() OVER () as rn from runs where action = 'start' order by datetime) as starts
  join
(select datetime,ROW_NUMBER() OVER () as rn from runs where action = 'end' order by datetime) as ends
on starts.rn = ends.rn)
,with_run as (SELECT *, (select rn from bounds where s <= r.datetime and e >= r.datetime) as run
  from runs as r)
,max_qty as (
SELECT run,max(qty) as qty
  from with_run
GROUP BY run)
SELECT s.user_id,s.action,s.qty,s.datetime from with_run as s join max_qty as f on s.run = f.run AND s.qty = f.qty;

- 测试数据 -

create table runs (user_id int, action text, qty int, datetime TIMESTAMP);
insert INTO runs VALUES 
(1,        'start',  0,   '2017-01-01 00:00:01')
,(1,       'record', 0,   '2017-01-01 00:00:01')
,(1,       'record', 4,   '2017-01-01 00:00:02')
,(1,       'record', 5,   '2017-01-01 00:00:03')
,(1,       'record', 6,   '2017-01-01 00:00:04')
,(1,       'end',    0,   '2017-01-01 00:00:04')
,(1,       'start',  0,   '2017-01-01 00:00:05')
,(1,       'record', 0,   '2017-01-01 00:00:05')
,(1,       'record', 2,   '2017-01-01 00:00:06')
,(1,       'record', 3,   '2017-01-01 00:00:07')
,(1,       'end',    0,   '2017-01-01 00:00:07')
,(2,       'start',  0,   '2017-01-01 00:00:08')
,(2,       'record', 0,   '2017-01-01 00:00:08')
,(2,       'record', 3,   '2017-01-01 00:00:09')
,(2,       'record', 8,   '2017-01-01 00:00:10')
,(2,       'end',    0,   '2017-01-01 00:00:10');

<强>更新 @Oto Shavadze答案可以缩短

with lookup as (select action,lag(t.*)  over(order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end) as r from runs t)
select (r::runs).user_id
      ,(r::runs).action
      ,(r::runs).qty
      ,(r::runs).datetime
from lookup where action = 'end';

我认为OP不清楚什么是最大值，最后一个记录在结束之前或最高数量。

SQL选择连续运行数据的最大值

2 个答案: