给定一个连续运行数据的表:一个在任务正在进行时总是增加的数字,并在下一个任务开始时重置为零,如何选择每个数据运行的最大值?
每次连续运行可以有任意数量的行,并且数据运行由a" start"标记。和"结束"行,例如数据可能看起来像
user_id, action, qty, datetime
1, start, 0, 2017-01-01 00:00:01
1, record, 0, 2017-01-01 00:00:01
1, record, 4, 2017-01-01 00:00:02
1, record, 5, 2017-01-01 00:00:03
1, record, 6, 2017-01-01 00:00:04
1, end, 0, 2017-01-01 00:00:04
1, start, 0, 2017-01-01 00:00:05
1, record, 0, 2017-01-01 00:00:05
1, record, 2, 2017-01-01 00:00:06
1, record, 3, 2017-01-01 00:00:07
1, end, 0, 2017-01-01 00:00:07
2, start, 0, 2017-01-01 00:00:08
2, record, 0, 2017-01-01 00:00:08
2, record, 3, 2017-01-01 00:00:09
2, record, 8, 2017-01-01 00:00:10
2, end, 0, 2017-01-01 00:00:10
结果将是每次运行的最大值:
user_id, action, qty, datetime
1, record, 6, 2017-01-01 00:00:04
1, record, 3, 2017-01-01 00:00:07
2, record, 8, 2017-01-01 00:00:10
使用任何postgres sql语法(9.3)?它是某种分组,然后从每个组中选择最大值,但我不知道如何进行分组。
答案 0 :(得分:3)
如果单个用户没有重叠,而下一次运行总是在以后开始,那么您可以使用LAG()
窗口功能。
with the_table(user_id, action, qty, datetime) as (
select 1,'start', 0, '2017-01-01 00:00:01'::timestamp union all
select 1,'record', 0, '2017-01-01 00:00:01'::timestamp union all
select 1,'record', 4, '2017-01-01 00:00:02'::timestamp union all
select 1,'record', 5, '2017-01-01 00:00:03'::timestamp union all
select 1,'record', 6, '2017-01-01 00:00:04'::timestamp union all
select 1,'end', 0, '2017-01-01 00:00:04'::timestamp union all
select 1,'start', 0, '2017-01-01 00:00:05'::timestamp union all
select 1,'record', 0, '2017-01-01 00:00:05'::timestamp union all
select 1,'record', 2, '2017-01-01 00:00:06'::timestamp union all
select 1,'record', 3, '2017-01-01 00:00:07'::timestamp union all
select 1,'end', 0, '2017-01-01 00:00:07'::timestamp union all
select 2,'start', 0, '2017-01-01 00:00:08'::timestamp union all
select 2,'record', 0, '2017-01-01 00:00:08'::timestamp union all
select 2,'record', 3, '2017-01-01 00:00:09'::timestamp union all
select 2,'record', 8, '2017-01-01 00:00:10'::timestamp union all
select 2,'end', 0, '2017-01-01 00:00:10'::timestamp
)
select n_user_id, n_action, n_qty, n_datetime from (
select action,
lag(user_id) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_user_id,
lag(action) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_action,
lag(qty) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_qty,
lag(datetime) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_datetime
from the_table
)t
where action = 'end'
由于某些action = record
行的日期时间与start
和end
行相同,我在CASE
中使用ORDER BY
,以明确start
首先是record
,然后是end
。
答案 1 :(得分:1)
快速而肮脏,假设跑步不重叠
with bounds as (select starts.rn, starts.datetime as s, ends.datetime as e from
(select datetime,ROW_NUMBER() OVER () as rn from runs where action = 'start' order by datetime) as starts
join
(select datetime,ROW_NUMBER() OVER () as rn from runs where action = 'end' order by datetime) as ends
on starts.rn = ends.rn)
,with_run as (SELECT *, (select rn from bounds where s <= r.datetime and e >= r.datetime) as run
from runs as r)
,max_qty as (
SELECT run,max(qty) as qty
from with_run
GROUP BY run)
SELECT s.user_id,s.action,s.qty,s.datetime from with_run as s join max_qty as f on s.run = f.run AND s.qty = f.qty;
- 测试数据 -
create table runs (user_id int, action text, qty int, datetime TIMESTAMP);
insert INTO runs VALUES
(1, 'start', 0, '2017-01-01 00:00:01')
,(1, 'record', 0, '2017-01-01 00:00:01')
,(1, 'record', 4, '2017-01-01 00:00:02')
,(1, 'record', 5, '2017-01-01 00:00:03')
,(1, 'record', 6, '2017-01-01 00:00:04')
,(1, 'end', 0, '2017-01-01 00:00:04')
,(1, 'start', 0, '2017-01-01 00:00:05')
,(1, 'record', 0, '2017-01-01 00:00:05')
,(1, 'record', 2, '2017-01-01 00:00:06')
,(1, 'record', 3, '2017-01-01 00:00:07')
,(1, 'end', 0, '2017-01-01 00:00:07')
,(2, 'start', 0, '2017-01-01 00:00:08')
,(2, 'record', 0, '2017-01-01 00:00:08')
,(2, 'record', 3, '2017-01-01 00:00:09')
,(2, 'record', 8, '2017-01-01 00:00:10')
,(2, 'end', 0, '2017-01-01 00:00:10');
<强>更新强> @Oto Shavadze答案可以缩短
with lookup as (select action,lag(t.*) over(order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end) as r from runs t)
select (r::runs).user_id
,(r::runs).action
,(r::runs).qty
,(r::runs).datetime
from lookup where action = 'end';
我认为OP不清楚什么是最大值,最后一个记录在结束之前或最高数量。