postgreSQL获取分区表中的最后一个ID /

时间:2013-06-04 11:58:14

标签: postgresql database-partitioning

我的问题与此问题基本相同,但我找不到答案,也写了“将在下一个版本中解决”和“易于最小/最大扫描”

PostgreSQL+table partitioning: inefficient max() and min()

CREATE TABLE mc_handstats
(
  id integer NOT NULL DEFAULT nextval('mc_handst_id_seq'::regclass),
  playerid integer NOT NULL,
  CONSTRAINT mc_handst_pkey PRIMARY KEY (id),
);

表格通过playerid分区。

CREATE TABLE mc_handst_0000 ( CHECK ( playerid >= 0 AND playerid < 10000) ) INHERITS (mc_handst) TABLESPACE ssd01;
CREATE TABLE mc_handst_0010 ( CHECK ( playerid >= 10000 AND playerid < 30000) ) INHERITS (mc_handst) TABLESPACE ssd02;
CREATE TABLE mc_handst_0030 ( CHECK ( playerid >= 30000 AND playerid < 50000) ) INHERITS (mc_handst) TABLESPACE ssd03;
...

CREATE INDEX mc_handst_0000_PlayerID ON mc_handst_0000 (playerid);
CREATE INDEX mc_handst_0010_PlayerID ON mc_handst_0010 (playerid);
CREATE INDEX mc_handst_0030_PlayerID ON mc_handst_0030 (playerid);
...

plus create trigger on playerID

我想得到最后一个id(我也可以得到序列的值,但我习惯使用表/ colums),但pSQL似乎是相当愚蠢的扫描表:

EXPLAIN ANALYZE从mc_handstats中选择max(id); (真正的查询永远运行)

"Aggregate  (cost=9080859.04..9080859.05 rows=1 width=4) (actual time=181867.626..181867.626 rows=1 loops=1)"
"  ->  Append  (cost=0.00..8704322.43 rows=150614644 width=4) (actual time=2.460..163638.343 rows=151134891 loops=1)"
"        ->  Seq Scan on mc_handstats  (cost=0.00..0.00 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1)"
"        ->  Seq Scan on mc_handst_0000 mc_handstats  (cost=0.00..728523.69 rows=12580969 width=4) (actual time=2.457..10800.539 rows=12656647 loops=1)"
...
ALL TABLES
...
"Total runtime: 181867.819 ms"

EXPLAIN ANALYZE从mc_handst_1000

中选择max(id)
"Aggregate  (cost=83999.50..83999.51 rows=1 width=4) (actual time=1917.933..1917.933 rows=1 loops=1)"
"  ->  Seq Scan on mc_handst_1000  (cost=0.00..80507.40 rows=1396840 width=4) (actual time=0.007..1728.268 rows=1396717 loops=1)"
"Total runtime: 1918.494 ms"

分区表的运行时为“snap”,完全脱离主表上的记录。 (postgreSQL 9.2)


\ d mc_handstats(只有索引)

Indexes:
    "mc_handst_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "mc_handst_playerid_fkey" FOREIGN KEY (playerid) REFERENCES mc_players(id)
Triggers:
    mc_handst_insert_trigger BEFORE INSERT ON mc_handstats FOR EACH ROW EXECUTE PROCEDURE mc_handst_insert_function()
Number of child tables: 20 (Use \d+ to list them.)

\ d mc_handst_1000

Indexes:
    "mc_handst_1000_playerid" btree (playerid)
Check constraints:
    "mc_handst_1000_playerid_check" CHECK (playerid >= 1000000 AND playerid < 1100000)

hm,子表中没有PK索引。虽然我不明白为什么max(id)的结果在子表上相当快(因为没有索引)而且从主表缓慢,似乎我需要为所有子表添加PK的索引。也许这解决了它。


CREATE INDEX mc_handst_0010_ID ON mc_handst_0010 (id);
... plus many more ...

一切都很好。仍然很奇怪为什么它之前在子表上工作得很快,这让我觉得它们已被编入索引,但我也不在乎。

谢谢你!

2 个答案:

答案 0 :(得分:0)

您需要做的第一件事是索引(id)上的所有子表,并查看max(id)是否足够智能以在每个表上执行索引扫描。我想我应该是,但我不完全确定。

如果没有,这就是我要做的事情:我会从currval([sequence_name])开始,然后继续工作直到找到记录。你可以做一些事情,一次检查10个块,或者基本上是一个稀疏扫描。这可以通过像这样的CTE来完成(再次依赖于索引):

 WITH RECURSIVE ids (
      select max(id) as max_id, currval('mc_handst_id_seq')  - 10 as min_block
        FROM mc_handst
       WHERE id BETWEEN currval('mc_handst_id_seq') - 10 AND currval('mc_handst_id_seq')
      UNION ALL
      SELECT max(id), i.min_block - 10
        FROM mc_handst
        JOIN ids i ON id BETWEEN i.min_block - 10 AND i.min_block
       WHERE i.max_id IS NULL
 )
 SELECT max(max_id) from ids;

如果分区编入索引后规划器不会使用索引,那么应该进行稀疏扫描。在大多数情况下,它应该只进行一次扫描,但会根据需要重复查找ID。请注意,它可能永远在空表上运行。

答案 1 :(得分:0)

假设父母的表是这样的:

CREATE TABLE parent AS (
  id not null default nextval('parent_id_seq'::regclass)
  ... other columns ...
);

您是否正在使用规则或触发器将INSERT转移到子表中,您可以在INSERT之后立即使用:

SELECT currval('parent_id_seq'::regclass);

获取会话插入的最后一个id,与并发INSERT无关,每个会话都有自己的最后一个序列值的副本。

https://dba.stackexchange.com/questions/58497/return-id-from-partitioned-table-in-postgres