Postgres:获取一组键的最新行

时间:2014-11-17 08:26:34

标签: sql postgresql greatest-n-per-group

我有一个简单的事件日志表:

uid | event_id | event_data
----+----------+------------
  1 |  1       | whatever
  2 |  2       |
  1 |  3       |
  4 |  4       |
  4    5       |

如果我需要给定用户的最新活动,那很明显:

SELECT * FROM events WHERE uid=needed_uid ORDER BY event_id DESC LIMIT 1

但是,假设我需要数组中每个用户id的最新事件。例如,对于上表和用户{1, 4},我希望事件{3, 5}。在没有求助于pgSQL循环的情况下,这可能在普通的SQL中吗?

5 个答案:

答案 0 :(得分:3)

Postgres特定的解决方案是使用distinct on,这通常比使用窗口函数的解决方案更快:

select distinct on (uid) uid, event_id, event_data
from events 
where uid in (1,4)
order by uid, event_id DESC

答案 1 :(得分:1)

尝试以下查询:

select DesiredColumnList 
from 
(
    select *, row_number() over ( partition by uid order by event_id desc) rn
    from yourtable
) t
where rn = 1

Row_Numberevent_id desc分配从1到每个行顺序的唯一编号,partition by将确保为每组uid编号。

答案 2 :(得分:1)

也许这会有所帮助:

SELECT uid,
       event_id
  FROM(SELECT uid,
              event_id,
              ROW_NUMBER() OVER (PARTITION BY uid ORDER BY event_ID DESC) rank
         FROM events
      )
 WHERE uid IN (1, 4)
   AND rank = 1

答案 3 :(得分:1)

给定的ID数组有可能在 Postgres 9.3或更高版本中进行优化查询:

  • 简短查询
  • 针对性能进行了优化
  • 以及按数组元素
  • 的顺序返回的行

Postgres 9.3

使用隐式JOIN LATERAL

SELECT e.*
FROM   unnest('{1, 4}'::int[]) a(uid)  -- input array here
     ,(SELECT * FROM events WHERE uid = a.uid ORDER BY event_id DESC LIMIT 1) e;

使用当前实现,它按数组元素的顺序返回行。但是如果没有实际的ORDER BY,则无法保证排序顺序。见下文。

仍存在细微差别(与DICTINCT ON as provided by @a_horse相比):如果给定数组具有重复元素,则会返回此查询的重复行,这可能会也可能不受欢迎。如果效果不受欢迎,请在DISTINCT之后和加入大表之前引入unnest()步骤。

这里的主要好处是优化索引使用。详细说明:

保证行顺序:

SELECT e.*
FROM  (SELECT '{1, 4}'::int[] AS arr) a  -- input array here
     , generate_subscripts(i.arr, 1) i 
     ,(SELECT * FROM event WHERE uid = a.arr[i.i] ORDER BY event_id DESC LIMIT 1) e
ORDER  BY i.i;

Postgres 9.4(即将发布)

SELECT e.*
FROM   unnest('{1, 4}'::int[]) WITH ORDINALITY a(uid, i)  -- input array here
     ,(SELECT * FROM events WHERE uid = a.uid ORDER BY event_id DESC LIMIT 1) e
ORDER  BY a.i;

WITH ORDINALITY的详细信息:

答案 4 :(得分:0)

在我发布问题后几秒钟,我就来了。它不是那么有效,而是要考虑所有选项:

SELECT * FROM events WHERE event_id IN 
    (SELECT MAX(event_id) FROM events GROUP BY uid WHERE uid IN (1,4))