Postgres 10.3严重分区表并且无法删除任何记录

时间:2018-03-15 04:11:31

标签: postgresql partitioning

在本机分区表上遇到此问题的任何人?

分区表有7202个分区。没有分区包含超过50条记录。分区是在外键上完成的。

任何删除操作,即

delete from contacts where id = ?
delete from contacts where id = ? and account_id = ?
delete from contacts where account_id = ?

会导致内存不足。

默认Postgres配置,但有异常 max_locks_per_transaction = 1024

Postgres日志:

2018-03-15 14:26:40.340 AEDT [7120] LOG:  server process (PID 8177) was terminated by signal 9: Killed
2018-03-15 14:26:40.340 AEDT [7120] DETAIL:  Failed process was running: delete from contacts where id = 82398 and account_id = 9000
2018-03-15 14:26:40.354 AEDT [7120] LOG:  terminating any other active server processes
2018-03-15 14:26:40.367 AEDT [3821] WARNING:  terminating connection because of crash of another server process
2018-03-15 14:26:40.367 AEDT [3821] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.367 AEDT [3821] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres WARNING:  terminating connection because of crash of another server process
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development WARNING:  terminating connection because of crash of another server process
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.569 AEDT [7120] LOG:  all server processes terminated; reinitializing
2018-03-15 14:26:40.639 AEDT [9244] LOG:  database system was interrupted; last known up at 2018-03-15 13:08:47 AEDT
2018-03-15 14:26:41.745 AEDT [9251] mark@postgres FATAL:  the database system is in recovery mode
2018-03-15 14:26:41.746 AEDT [9252] mark@postgres FATAL:  the database system is in recovery mode
2018-03-15 14:26:44.778 AEDT [9244] LOG:  database system was not properly shut down; automatic recovery in progress
2018-03-15 14:26:44.798 AEDT [9244] LOG:  redo starts at 0/56782CE0
2018-03-15 14:26:44.798 AEDT [9244] LOG:  invalid record length at 0/56782D18: wanted 24, got 0
2018-03-15 14:26:44.798 AEDT [9244] LOG:  redo done at 0/56782CE0
2018-03-15 14:26:44.870 AEDT [7120] LOG:  database system is ready to accept connections

2 个答案:

答案 0 :(得分:1)

来自Amit Langote,pgsql-bugs

  

我可以重现在我谦虚的情况下触发的OOM   开发机器,也许这就是你的情况下发生的事情。

     

鉴于潜在的计划,这是不幸的预期   机制无法应对超过几百个分区。 :-(见相关内容   文件中注明;此链接页面的最后一行:   https://www.postgresql.org/docs/devel/static/ddl-partitioning.html

     

在该领域的情况有所改善之前,可能需要采取一种解决方法   直接在分区上删除操作,因为它可以这样做。   或者重新设计架构以使用更少数量的分区。

我知道这实际上不是一个解决方案,而是一个警示故事。

看来postgresql 10中的原生分区不适合我们的用例。我的部分内容是评估它的适用性。我怀疑攻击性分区会有成本,但不会发生内存问题。

仍然可以发布自己的经验和解决方案。

答案 1 :(得分:0)

在具有多级本机分区的表上,我们在PostgreSQL 11上存在完全相同的问题。主表按商店划分,每个商店按最近几年的期间(年-月)划分。即数千个分区在一起。

当我们需要对所有商店进行删除或更新时,PostgreSQL崩溃。 PostgreSQL能够处理一个分区级别上的删除/更新-即从一个特定的商店及其每月分区中。因为在这里,每个商店只有几十个分区。

但是,当我们尝试删除或更新主顶级父表时,数据库崩溃了-也就是说,这里要处理数千个分区。

崩溃仍然相同-监视显示PostgreSQL开始使用大量内存,并最终被OOM杀手杀死。调整work_mem或其他设置的影响似乎很小-PostgreSQL稍后崩溃。

因此,我们必须使用车间循环来对该表进行所有删除/更新,并分别对这些分区进行删除或更新。但这至少行得通。

为便于解释-这些细粒度的分区对于我们的客户门户网站非常有用。因为我们在那里存储汇总数据,并使用直接按特定商店每月划分的分区构造查询,所以客户可以非常快速地看到数据。并且本机分区负责在插入过程中在整个结构上分配数据,这真是令人惊讶...