Question

优化在子组的下一个id值上与同一个表连接的连接查询的最佳方法是什么？现在我有这样的事情：

CREATE OR REPLACE FUNCTION select_next_id(bigint, bigint) RETURNS bigint AS $body$
DECLARE
    _id bigint;
BEGIN
    SELECT id INTO _id FROM table WHERE id_group = $2 AND id > $1 ORDER BY id ASC LIMIT 1;
    RETURN _id;
END;
$body$ LANGUAGE plpgsql;

和JOIN查询：

SELECT * FROM table t1
JOIN table t2 ON t2.id = select_next_id(t1.id, t1.id_group)

该表有超过2kk的行，并且需要非常长的时间。有没有更好的方法快速做到这一点？我在列id上也有UNIQUE INDEX。我猜不是很有帮助。

一些示例数据：

id | id_group
=============
1  | 1
2  | 1
3  | 1
4  | 2
5  | 2
6  | 2
20 | 4
25 | 4
37 | 4
40 | 1
55 | 2

我想收到这样的话：

id | id_next
1  | 2
2  | 3
3  | null
4  | 5 
5  | 6
6  | 55

等等。

Answer 1

对于函数中的查询，您需要(id_group, id)上的索引，而不仅仅是(id)。

接下来，您不需要在函数本身中使用plpgsql的开销，并且可以通过使其稳定并且成本较低来为计划者提供一些提示：

CREATE OR REPLACE FUNCTION select_next_id(bigint, bigint) RETURNS bigint AS $body$
    SELECT id FROM table WHERE id_group = $2 AND id > $1 ORDER BY id ASC LIMIT 1;
$body$ LANGUAGE sql STABLE COST 10;

在最终查询中，根据您实际尝试做的事情，您可以使用马突出显示的lead()来摆脱连接和函数调用：

http://www.postgresql.org/docs/current/static/tutorial-window.html

Answer 2

我不完全确定，但我认为你想要这样的事情：

select id, 
       lead(id) over (partition by id_group order by id) as id_next
from the_table
order by id, id_next;

Postgres：优化“大于”查询的最佳方式

2 个答案: