Question

我按照PostgreSQL 11的说明设置了逻辑复制：https://www.digitalocean.com/community/tutorials/how-to-set-up-logical-replication-with-postgresql-10-on-ubuntu-18-04

一切正常，经过测试，更改得以复制。

但是，一个月后...更改似乎没有被复制，并且Postgres似乎正在使用大量CPU和带宽。

在2vCPU / 4GB DigitalOcean服务器上，平均负载约为2.5。
带宽为〜1MB / s。
此服务器和数据库上的活动基本上为零。

这引起了一些问题，例如：

具有逻辑流复制的非活动数据库使用这么多的资源是否正常？
关于复制为何已停止的任何想法？（更改主服务器上的记录不再影响副本）
是否有一些监视和查看复制状态的专业提示？

Postgres主服务器日志中包含以下各种消息：

2019-04-22 06:26:16.986 UTC [20371] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC21198
2019-04-22 06:26:16.986 UTC [20371] replica_user@server_prod DETAIL:  There are no running transactions.
2019-04-22 06:26:17.010 UTC [20372] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC211D0
2019-04-22 06:26:17.010 UTC [20372] replica_user@server_prod DETAIL:  There are no running transactions.
2019-04-22 06:26:17.055 UTC [20373] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC21208
2019-04-22 06:26:17.055 UTC [20373] replica_user@server_prod DETAIL:  There are no running transactions.
2019-04-22 06:26:17.078 UTC [20374] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC21240
2019-04-22 06:26:17.078 UTC [20374] replica_user@server_prod DETAIL:  There are no running transactions.
2019-04-22 06:26:17.114 UTC [20375] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC21278
2019-04-22 06:26:17.114 UTC [20375] replica_user@server_prod DETAIL:  There are no running transactions.
2019-04-22 06:26:17.154 UTC [20376] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC212B0
2019-04-22 06:26:17.154 UTC [20376] replica_user@server_prod DETAIL:  There are no running transactions.
2019-04-22 06:26:17.186 UTC [20377] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC212E8
2019-04-22 06:26:17.186 UTC [20377] replica_user@server_prod DETAIL:  There are no running transactions.
2019-04-22 06:26:17.229 UTC [20378] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC21320
2019-04-22 06:26:17.229 UTC [20378] replica_user@server_prod DETAIL:  There are no running transactions.
2019-04-22 06:26:17.235 UTC [20378] replica_user@server_prod LOG:  could not send data to client: Connection reset by peer
2019-04-22 06:26:17.235 UTC [20378] replica_user@server_prod STATEMENT:  COPY public.class_registrations TO STDOUT
2019-04-22 06:26:17.235 UTC [20378] replica_user@server_prod FATAL:  connection to client lost
2019-04-22 06:26:17.235 UTC [20378] replica_user@server_prod STATEMENT:  COPY public.class_registrations TO STDOUT
2019-04-22 06:26:17.259 UTC [20379] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC21358
2019-04-22 06:26:17.259 UTC [20379] replica_user@server_prod DETAIL:  There are no running transactions.
2019-04-22 06:26:21.327 UTC [20418] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC21390
2019-04-22 06:26:21.327 UTC [20418] replica_user@server_prod DETAIL:  There are no running transactions.
2019-04-22 06:26:21.341 UTC [20419] replica_user@server_prod LOG:  logical decoding found consistent point at 0/1EC213C8
2019-04-22 06:26:21.341 UTC [20419] replica_user@server_prod DETAIL:  There are no running transactions.

副本服务器中充满了这些消息：

2019-04-21 06:26:07.619 UTC [2967] LOG:  logical replication table synchronization worker for subscription "replica_subscription", table "messages" has started
2019-04-21 06:26:07.645 UTC [2966] ERROR:  duplicate key value violates unique constraint "account_locations_pkey"
2019-04-21 06:26:07.645 UTC [2966] DETAIL:  Key (id)=(1) already exists.
2019-04-21 06:26:07.645 UTC [2966] CONTEXT:  COPY account_locations, line 1
2019-04-21 06:26:07.648 UTC [16353] LOG:  background worker "logical replication worker" (PID 2966) exited with exit code 1
2019-04-21 06:26:07.652 UTC [2968] LOG:  logical replication table synchronization worker for subscription "replica_subscription", table "user_photos" has started
2019-04-21 06:26:07.663 UTC [2967] ERROR:  duplicate key value violates unique constraint "messages_pkey"
2019-04-21 06:26:07.663 UTC [2967] DETAIL:  Key (id)=(1) already exists.
2019-04-21 06:26:07.663 UTC [2967] CONTEXT:  COPY messages, line 1

这是最近6个小时的平均负载（您可以看到我何时在副本服务器上删除了订户）。

这是带宽：

这也是{@ {1}}只需约10-15秒的监视结果：

Answer 1

在按照Laurenz的建议查看日志后，看来我的初始数据加载没有针对所有表正确的主ID序列。（不确定如何发生）

要解决复制问题，我执行了以下操作：

从副本服务器中删除订阅
删除所有表格
重新加载所有表-仅架构（无数据）
再次创建订阅

这将导致所有数据同步，并且一切又恢复正常。我通过更新数据并在副本服务器中看到更新来确认。

出现复制错误时，似乎出现了CPU高负载和高带宽的情况，Postgres会尽可能地反复尝试。

PostgreSQL逻辑复制应使用多少CPU /带宽？

1 个答案: