PostgreSQL - “DISTINCT ON”和“GROUP BY”语法

时间:2013-09-22 00:53:17

标签: sql postgresql

我意识到数据库查询返回意外的结果会导致我不正当地使用“DISTINCT ON”和“GROUP BY”

我希望有人能指引我直截了当。实际的查询非常复杂,所以我会愚蠢地说:

我有一个表/内部查询,由object_id和时间戳组成:

CREATE TABLE test_select ( object_id INT , event_timestamp timestamp );
COPY test_select (object_id , event_timestamp) FROM stdin (DELIMITER '|');
1           | 2013-01-27 21:01:20
1           | 2012-06-28 14:36:26
1           | 2013-02-21 04:16:48
2           | 2012-06-27 19:53:05
2           | 2013-02-03 17:35:58
3           | 2012-06-14 20:17:00
3           | 2013-02-15 19:03:34
4           | 2012-06-13 13:59:47
4           | 2013-02-23 06:31:16
5           | 2012-07-03 01:45:56
5           | 2012-06-11 21:33:26
\.

我正在尝试选择一个不同的ID,按逆时针上的时间戳排序/重复数据删除

所以结果应该是[4,1,3,2,5]

我认为这就是我所需要的(似乎):

SELECT object_id  
FROM test_select 
GROUP BY object_id 
ORDER BY max(event_timestamp) DESC
;

出于测试/审核的目的,我有时希望包含时间戳字段。我似乎无法弄清楚如何在该查询中包含另一个字段。

有人能指出我上面的sql中的明显问题,或者有关如何包含审核信息的建议吗?

2 个答案:

答案 0 :(得分:18)

为了能够选择所有列,而不仅仅是object_idMAX(event_timestamp),您可以使用DISTINCT ON

SELECT DISTINCT ON (object_id) 
    object_id, event_timestamp    ---, more columns
FROM test_select 
ORDER BY object_id, event_timestamp DESC ;

如果您希望按event_timestamp DESC而不是object_id排序结果,则需要将其包含在派生表或CTE中:

SELECT *
FROM 
  ( SELECT DISTINCT ON (object_id) 
        object_id, event_timestamp    ---, more columns
    FROM test_select 
    ORDER BY object_id, event_timestamp DESC 
  ) AS t
ORDER BY event_timestamp DESC ;

或者,您可以使用窗口函数,例如ROW_NUMBER()

WITH cte AS
  ( SELECT ROW_NUMBER() OVER (PARTITION BY object_id 
                              ORDER BY event_timestamp DESC) 
             AS rn, 
           object_id, event_timestamp    ---, more columns
    FROM test_select 
  )
SELECT object_id, event_timestamp    ---, more columns
FROM cte
WHERE rn = 1
ORDER BY event_timestamp DESC ;

或使用MAX()汇总OVER

WITH cte AS
  ( SELECT MAX(event_timestamp) OVER (PARTITION BY object_id) 
             AS max_event_timestamp, 
           object_id, event_timestamp    ---, more columns
    FROM test_select 
  )
SELECT object_id, event_timestamp    ---, more columns
FROM cte
WHERE event_timestamp = max_event_timestamp
ORDER BY event_timestamp DESC ;

答案 1 :(得分:3)

这可能不是处理此问题的最佳方法,但您可以尝试使用窗口函数:

SELECT DISTINCT object_id, MAX(event_timestamp)
OVER (PARTITION BY object_id)  
FROM test_select ORDER BY max DESC;

从另一方面它也起作用:

SELECT object_id, MAX(event_timestamp) as max_event_timestamp
FROM test_select 
GROUP BY object_id 
ORDER BY max_event_timestamp DESC;