在本机分区表上遇到此问题的任何人?
分区表有7202个分区。没有分区包含超过50条记录。分区是在外键上完成的。
任何删除操作,即
delete from contacts where id = ?
delete from contacts where id = ? and account_id = ?
delete from contacts where account_id = ?
会导致内存不足。
默认Postgres配置,但有异常 max_locks_per_transaction = 1024
Postgres日志:
2018-03-15 14:26:40.340 AEDT [7120] LOG: server process (PID 8177) was terminated by signal 9: Killed
2018-03-15 14:26:40.340 AEDT [7120] DETAIL: Failed process was running: delete from contacts where id = 82398 and account_id = 9000
2018-03-15 14:26:40.354 AEDT [7120] LOG: terminating any other active server processes
2018-03-15 14:26:40.367 AEDT [3821] WARNING: terminating connection because of crash of another server process
2018-03-15 14:26:40.367 AEDT [3821] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.367 AEDT [3821] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres WARNING: terminating connection because of crash of another server process
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development WARNING: terminating connection because of crash of another server process
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.569 AEDT [7120] LOG: all server processes terminated; reinitializing
2018-03-15 14:26:40.639 AEDT [9244] LOG: database system was interrupted; last known up at 2018-03-15 13:08:47 AEDT
2018-03-15 14:26:41.745 AEDT [9251] mark@postgres FATAL: the database system is in recovery mode
2018-03-15 14:26:41.746 AEDT [9252] mark@postgres FATAL: the database system is in recovery mode
2018-03-15 14:26:44.778 AEDT [9244] LOG: database system was not properly shut down; automatic recovery in progress
2018-03-15 14:26:44.798 AEDT [9244] LOG: redo starts at 0/56782CE0
2018-03-15 14:26:44.798 AEDT [9244] LOG: invalid record length at 0/56782D18: wanted 24, got 0
2018-03-15 14:26:44.798 AEDT [9244] LOG: redo done at 0/56782CE0
2018-03-15 14:26:44.870 AEDT [7120] LOG: database system is ready to accept connections
答案 0 :(得分:1)
来自Amit Langote,pgsql-bugs
我可以重现在我谦虚的情况下触发的OOM 开发机器,也许这就是你的情况下发生的事情。
鉴于潜在的计划,这是不幸的预期 机制无法应对超过几百个分区。 :-(见相关内容 文件中注明;此链接页面的最后一行: https://www.postgresql.org/docs/devel/static/ddl-partitioning.html
在该领域的情况有所改善之前,可能需要采取一种解决方法 直接在分区上删除操作,因为它可以这样做。 或者重新设计架构以使用更少数量的分区。
我知道这实际上不是一个解决方案,而是一个警示故事。
看来postgresql 10中的原生分区不适合我们的用例。我的部分内容是评估它的适用性。我怀疑攻击性分区会有成本,但不会发生内存问题。
仍然可以发布自己的经验和解决方案。
答案 1 :(得分:0)
在具有多级本机分区的表上,我们在PostgreSQL 11上存在完全相同的问题。主表按商店划分,每个商店按最近几年的期间(年-月)划分。即数千个分区在一起。
当我们需要对所有商店进行删除或更新时,PostgreSQL崩溃。 PostgreSQL能够处理一个分区级别上的删除/更新-即从一个特定的商店及其每月分区中。因为在这里,每个商店只有几十个分区。
但是,当我们尝试删除或更新主顶级父表时,数据库崩溃了-也就是说,这里要处理数千个分区。
崩溃仍然相同-监视显示PostgreSQL开始使用大量内存,并最终被OOM杀手杀死。调整work_mem或其他设置的影响似乎很小-PostgreSQL稍后崩溃。
因此,我们必须使用车间循环来对该表进行所有删除/更新,并分别对这些分区进行删除或更新。但这至少行得通。
为便于解释-这些细粒度的分区对于我们的客户门户网站非常有用。因为我们在那里存储汇总数据,并使用直接按特定商店每月划分的分区构造查询,所以客户可以非常快速地看到数据。并且本机分区负责在插入过程中在整个结构上分配数据,这真是令人惊讶...