sql多日期分组汇总

时间:2019-11-24 19:42:21

标签: sql postgresql group-by

我有一个名为component的sql表,看起来像这样

id    component_id    date_updated     status
--    ------------    ------------     ------
1     1               2019-08-02       EDIT
2     1               2019-08-01       PUBLISH
3     2               2019-08-12       PUBLISH
4     3               2019-08-07       EDIT
5     3               2019-08-06       EDIT
6     1               2019-06-01       EDIT

现在我想要的是一个新表,以查​​看上次更新的时间,上次发布的时间和发布状态(我正在生产中的几个)

component_id     last_updated    last_status     last_published   published_status
------------     ------------    -----------     --------------   ----------------

1                2019-08-02      EDIT            2019-08-01      PUBLISH
2                2019-08-12      PUBLISH         2019-08-12      PUBLISH
3                2019-08-07      EDIT            <BLANK>

我是这样开始的

select c1.component_id, c1.date_updated as last_updated, c2.status 
from (
     select component_id, max(date_updated) 
     from components
     group_by component_id) as c1
left join components as c2 on c1.component_id = c2.component_id

但是当我想在status ='PUBLISH'时获取下一个date_update时,我有点卡住和复杂的事情

我应如何做?它用于Postgres数据库

2 个答案:

答案 0 :(得分:0)

您可以对两个日期列使用条件聚合,然后对最后一个状态使用数组:

select component_id,
       max(date_updated) as last_updated,
       (array_agg(status order by date_updated))[1] as last_status,
       max(date_updated) filter (where status = 'published') as last_published,
       max(status) filter (where status = 'published') as last_published_status
from components
group by component_id;

像这样使用array_agg()等效于first()聚合函数。

还有其他方法可以不使用子查询来解决此问题,例如,使用distinct on

select distinct on (component_id) component_id,
       date_updated as last_updated,
       status as last_status,
       max(date_updated) filter (where status = 'published') over (partition by component_id) as last_published 
from components
order by component_id, last_updated desc;

答案 1 :(得分:0)

通过使用over和partition,可以按component_id排序,并根据日期获取每个组件ID的最新值,然后根据该状态加入最新的发布日期。

示例:

with last_status as
(
SELECT 
  component_id
  , date_updated
  , status
  , row_number() over (partition by component_id order by date_updated desc) as R
  FROM test_data
),

last_publish as 
(
SELECT 
  component_id
  , max(date_updated) as last_published
FROM test_data
where upper(status) = 'PUBLISH'
GROUP BY component_id
)

select 
  last_status.* 
  , last_publish.last_published
from last_status 
left join last_publish on last_publish.component_id = last_status.component_id
where last_status.R = 1